Multiplication by Halving and Doubling in AARCH64 Assembly

While multiplication is defined in the context of repeated addition, implementing it that way algorithmically is not nearly as efficient as some other approaches. One algorithm for multiplication that is an order of magnitude faster is to halve one number while doubling the other. I gave myself the challenge of implementing this algorithm in AARCH64 Assembly, and it was not too hard.

This exercise allowed me to work through a few operations.

  • In order to double a number, you shift its bits left.
  • In order to cut it in half, you shift it right.
  • In order to check if a number is negative, you perform a logical and with the most significant bit.
  • In order to reverse the sign of a number, you perform an exclusive or (eor) on the bits.
  • In order to check if a number is odd or even, you perform a logical and with the least significant bit

Here’s the code. I used the debug.s file as presented by Stephen Smith in “Programming with 64-bit ARM Assembly Language.” The main section of the code is a series of unit tests that show the algorithm works for various ases of positive, negative, and zero value operands.

	.include "debug.s"
	.global main
 
main:
	mov x1, #55  //First multiplicand
	mov x2, #12  //Second  multiplicand
	bl adder
 
	mov x1, #-55  //First multiplicand
	mov x2, #12  //Second  multiplicand
	bl adder
 
	mov x1, #-5  //First multiplicand
	mov x2, #-32  //Second  multiplicand
	bl adder
 
	mov x1, #5  //First multiplicand
	mov x2, #-3  //Second  multiplicand
	bl adder
 
	mov x1, #0  //First multiplicand
	mov x2, #-3  //Second  multiplicand
	bl adder
 
	mov x1, #10  //First multiplicand
	mov x2, #0  //Second  multiplicand
	bl adder
 
endProgram
 
 
adder:
	printStr "----"
	printReg 1
	printReg 2
 
	cmp x1, #0  //If either value is 0, we are done
	b.eq done
	cmp x2, #0
	b.ne prep
        mov x1, x2
	b done
prep:	mov x3, #0 //keeps track of signedness.
	mov x4, #0 //running addend
	//ensure A is positive
	cmp x1, 0
	B.GE step2
	mov x3, 1
	eor x1,x1, #0xfffffffffffffffe
	eor x1,x1, #0x1
	add x1, x1, 1
 
	//Ensure B is postive
step2:  cmp x2, 0
	B.GE step3
	eor x3, x3, 1
	eor x2,x2, #0xfffffffffffffffe
	eor x2,x2, #0x1
	add x2, x2, 1
 
step3:	cmp x2, #1
	b.eq wrapup
 
	and x9, x2,#0x1
	CMP x9, #0x1
	B.LT step4
	add x4, x4, x1
 
step4:  LSL x1, x1, 1
	LSR x2, x2, 1
	b step3
 
wrapup:	add x1, x1, x4
	cmp x3, #1 
	b.ne done
	//convert to negative
	eor x1, x1, #0xfffffffffffffffe
	eor x1, x1, #0x1
	add x1, x1, 1
done:
	mov x0, x1
	//final answer
	printReg 0
 
	ret

One stumble I made was in the xor to convert from positive to negative and the reverse. The compiler seems to treat a set of a 1s as an overflow condition, so I treated the least significant bit separately. This must be a common operation, and I am sure there is a better way to do it.

37 Instructions for the adder function, if we strip out the print statements. Thus, this is obviously going to beat the iterative approach for any number where the smaller operand is 37 or larger. Assuming I solve the eor problem above, that removes 3 more operations. Plus, an iterative approach is going to have a minimal amount of control operations, lets guess 5 (two comparisons and the final ret). That means 29 operations.

More interesting is the growth of the complexity of the algorithm presented here. As B gets larger with the naive algorithm, the number of operations increases linearly. O(B). For this algorthm, the number of operations increases based on the log base2 of B. Thus, for very large values of B, the halving/doubling approach would take much less time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.