While multiplication is defined in the context of repeated addition, implementing it that way algorithmically is not nearly as efficient as some other approaches. One algorithm for multiplication that is an order of magnitude faster is to halve one number while doubling the other. I gave myself the challenge of implementing this algorithm in AARCH64 Assembly, and it was not too hard.
This exercise allowed me to work through a few operations.
- In order to double a number, you shift its bits left.
- In order to cut it in half, you shift it right.
- In order to check if a number is negative, you perform a logical and with the most significant bit.
- In order to reverse the sign of a number, you perform an exclusive or (eor) on the bits.
- In order to check if a number is odd or even, you perform a logical and with the least significant bit
Here’s the code. I used the debug.s file as presented by Stephen Smith in “Programming with 64-bit ARM Assembly Language.” The main section of the code is a series of unit tests that show the algorithm works for various ases of positive, negative, and zero value operands.
.include "debug.s" .global main main: mov x1, #55 //First multiplicand mov x2, #12 //Second multiplicand bl adder mov x1, #-55 //First multiplicand mov x2, #12 //Second multiplicand bl adder mov x1, #-5 //First multiplicand mov x2, #-32 //Second multiplicand bl adder mov x1, #5 //First multiplicand mov x2, #-3 //Second multiplicand bl adder mov x1, #0 //First multiplicand mov x2, #-3 //Second multiplicand bl adder mov x1, #10 //First multiplicand mov x2, #0 //Second multiplicand bl adder endProgram adder: printStr "----" printReg 1 printReg 2 cmp x1, #0 //If either value is 0, we are done b.eq done cmp x2, #0 b.ne prep mov x1, x2 b done prep: mov x3, #0 //keeps track of signedness. mov x4, #0 //running addend //ensure A is positive cmp x1, 0 B.GE step2 mov x3, 1 eor x1,x1, #0xfffffffffffffffe eor x1,x1, #0x1 add x1, x1, 1 //Ensure B is postive step2: cmp x2, 0 B.GE step3 eor x3, x3, 1 eor x2,x2, #0xfffffffffffffffe eor x2,x2, #0x1 add x2, x2, 1 step3: cmp x2, #1 b.eq wrapup and x9, x2,#0x1 CMP x9, #0x1 B.LT step4 add x4, x4, x1 step4: LSL x1, x1, 1 LSR x2, x2, 1 b step3 wrapup: add x1, x1, x4 cmp x3, #1 b.ne done //convert to negative eor x1, x1, #0xfffffffffffffffe eor x1, x1, #0x1 add x1, x1, 1 done: mov x0, x1 //final answer printReg 0 ret |
One stumble I made was in the xor to convert from positive to negative and the reverse. The compiler seems to treat a set of a 1s as an overflow condition, so I treated the least significant bit separately. This must be a common operation, and I am sure there is a better way to do it.
37 Instructions for the adder function, if we strip out the print statements. Thus, this is obviously going to beat the iterative approach for any number where the smaller operand is 37 or larger. Assuming I solve the eor problem above, that removes 3 more operations. Plus, an iterative approach is going to have a minimal amount of control operations, lets guess 5 (two comparisons and the final ret). That means 29 operations.
More interesting is the growth of the complexity of the algorithm presented here. As B gets larger with the naive algorithm, the number of operations increases linearly. O(B). For this algorthm, the number of operations increases based on the log base2 of B. Thus, for very large values of B, the halving/doubling approach would take much less time.