Out of curiosity, I decided to code a 17 bit version of the double dabble routine, adapted from the code in Microchip Appnote AN-526. It was a trivial change because the 17th bit can be transferred directly to the output without any BCD adjustment, and then the rest of the algorithm is the same as before.
Note that the algorithm produces packed BCD, and so I added the 'unpack' routine to convert to unpacked format to match the other methods. The number of cycles to convert a number is 877 regardless of the size of number. I also tried a modified version of the algorithm that generates unpacked decimal directly. However, this requires the shifting of twice as many digit registers, so that it ends up being slower than the packed BCD version, taking 969 cycles. This code could certainly be optimized by unwinding some loops, but the real advantage of the Double Dabble algorithm is the compactness of the code, not its speed.
So, speedwise, the polynomial method is still fastest by a wide margin, and the code is fairly compact.
For compactness (especially for packed BCD), and ease of extending to larger numbers, the Double Dabble algorithm is the winner.
As for the divide by 10 method, I don't feel too bad about it. It was an interesting experiment. It is faster than double dabble, and about as easy as Double Dabble to extend to larger numbers, but the code size is the largest of the three algorithms compared here.
Edit: I checked the code given by 1and0 on the Microchip forum, but it appeared that the speed of the double dabble method given there was about the same as what I came up with. I think that at some point in that discussion, they moved on to a different method that was faster, but it wasn't double dabble, though I may have missed something.
Code:
;Double Dabble 17 bit version to give packed BCD (with unpack at the end)
;Uses registers y0,y1,y2 to hold binary number (only bit 0 of y2 is used)
;Uses x0 as shift counter, and x1 as temp register for decimal adjustment
;Use of x0 and x1 are for consistency with the register usage of code posted earlier.
;Adapted from Microchip Appnote AN-526, and updated to use enhanced instruction set mnemonics.
Bin_to_BCD17
movlw 16
movwf x0
clrf digit0 ;These clrf instructions were part of the original AN-526 code
clrf digit1 ;but appear to be unnecessary
clrf digit2
clrf fsr0H
rrf y2,w
rlf digit0,f ;move high bit directly to digit0
b2BDC17_Lp
rlf y0,f
rlf y1,f
rlf digit0,f
rlf digit1,f
rlf digit2,f
decfsz x0,f
goto adjDEC17
unpack17
;Unpack the BCD digits from 3 bytes to 6 bytes
; (If packed BCD is okay, then skip to the return instruction.)
movf digit2,w
movwf digit5
movwf digit4
movf digit1,w
movwf digit3
movwf digit2
swapf digit0,w
movwf digit1
swapf digit5,f
swapf digit3,f
movlw 0x0F
andwf digit0,f
andwf digit1,f
andwf digit2,f
andwf digit3,f
andwf digit4,f
andwf digit5,f
return
adjDEC17
movlw digit0
movwf fsr0L
call adjBCD17
incf fsr0L,f
call adjBCD17
incf fsr0L,f
call adjBCD17
goto b2BDC17_Lp
adjBCD17
movlw 3
addwfc indf0,W
movwf x1
btfsc x1,3 ; test if result > 7
movwf indf0
movlw 0x30
addwf indf0,W
movwf x1
btfsc x1,7 ; test if result > 7
movwf indf0 ; save as MSD
return
Note that the algorithm produces packed BCD, and so I added the 'unpack' routine to convert to unpacked format to match the other methods. The number of cycles to convert a number is 877 regardless of the size of number. I also tried a modified version of the algorithm that generates unpacked decimal directly. However, this requires the shifting of twice as many digit registers, so that it ends up being slower than the packed BCD version, taking 969 cycles. This code could certainly be optimized by unwinding some loops, but the real advantage of the Double Dabble algorithm is the compactness of the code, not its speed.
So, speedwise, the polynomial method is still fastest by a wide margin, and the code is fairly compact.
For compactness (especially for packed BCD), and ease of extending to larger numbers, the Double Dabble algorithm is the winner.
As for the divide by 10 method, I don't feel too bad about it. It was an interesting experiment. It is faster than double dabble, and about as easy as Double Dabble to extend to larger numbers, but the code size is the largest of the three algorithms compared here.
Edit: I checked the code given by 1and0 on the Microchip forum, but it appeared that the speed of the double dabble method given there was about the same as what I came up with. I think that at some point in that discussion, they moved on to a different method that was faster, but it wasn't double dabble, though I may have missed something.
Last edited: