base:8bit_multiplication_16bit_product_fast_no_tables
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
base:8bit_multiplication_16bit_product_fast_no_tables [2020-02-02 22:00] – djmips | base:8bit_multiplication_16bit_product_fast_no_tables [2023-03-15 03:25] (current) – djmips | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== 8bit multiplication with 16bit product ====== | ||
+ | |||
+ | This code aims to be fast, without using tables. | ||
+ | |||
< | < | ||
; mul 8x8 16 bit result for when you can't afford big tables | ; mul 8x8 16 bit result for when you can't afford big tables | ||
; by djmips | ; by djmips | ||
; | ; | ||
- | ; inputs are mul1 and mul2 and A should be zero | + | ; inputs are mul1 and X. mul1 and mul2 should be zp locations |
- | ; output is 16 bit in A : mul1 | + | ; A should be zero entering but if you want it will factor |
; | ; | ||
+ | ; output is 16 bit in A : mul1 (A is high byte) | ||
+ | ; | ||
+ | ; length = 65 bytes | ||
; total cycles worst case = 113 | ; total cycles worst case = 113 | ||
; total cycles best case = 97 | ; total cycles best case = 97 | ||
; avg = 105 | ; avg = 105 | ||
- | ; inner loop credits | + | ; inner loop credits |
MUL: | MUL: | ||
- | dec mul2 ; | + | cpx #$00 |
- | ror mul1 ;5 \ | + | beq zro |
- | bcc b1 ;2/3 | + | |
- | adc mul2 ;3 / | + | stx mul2 |
- | b1: ror ;2 \ | + | ror mul1 |
- | ror mul1 ;5 \ | + | bcc b1 |
- | bcc b2 ;2/3 / Best case 10 Worst case 12 | + | adc mul2 |
- | adc mul2 ;3 / | + | b1: ror |
+ | ror mul1 | ||
+ | bcc b2 | ||
+ | adc mul2 | ||
b2: ror | b2: ror | ||
ror mul1 | ror mul1 | ||
bcc b3 | bcc b3 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
b3: ror | b3: ror | ||
ror mul1 | ror mul1 | ||
bcc b4 | bcc b4 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
b4: ror | b4: ror | ||
ror mul1 | ror mul1 | ||
bcc b5 | bcc b5 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
b5: ror | b5: ror | ||
ror mul1 | ror mul1 | ||
bcc b6 | bcc b6 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
b6: ror | b6: ror | ||
ror mul1 | ror mul1 | ||
bcc b7 | bcc b7 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
b7: ror | b7: ror | ||
ror mul1 | ror mul1 | ||
bcc b8 | bcc b8 | ||
- | adc mul2 ; 10 or 12 | + | adc mul2 |
- | b8: ror ; 2 | + | b8: ror |
- | ror mul1 ; 5 | + | ror mul1 |
- | inc mul2 ; 5 | + | inx ; Optional - this preserves X across the call - could also do inc mul2 or leave out |
rts | rts | ||
+ | |||
+ | zro: stx mul1 | ||
+ | txa | ||
+ | | ||
</ | </ | ||
base/8bit_multiplication_16bit_product_fast_no_tables.1580677255.txt.gz · Last modified: 2020-02-02 22:00 by djmips