base:fastest_multiplication
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
base:fastest_multiplication [2017-04-19 09:47] – repose | base:fastest_multiplication [2023-08-21 02:45] – repose | ||
---|---|---|---|
Line 9: | Line 9: | ||
Mine: 196 zp variation: 192 \\ | Mine: 196 zp variation: 192 \\ | ||
Times above need to add 12 for jsr/rts \\ | Times above need to add 12 for jsr/rts \\ | ||
+ | Note: updated 2023; corrected typos and timings \\ | ||
< | < | ||
Line 15: | Line 16: | ||
;and being less elegant and harder to follow. | ;and being less elegant and harder to follow. | ||
;by Repose 2017 | ;by Repose 2017 | ||
+ | ;table generator by Graham | ||
+ | ;addition improvement suggested by JackAsser | ||
+ | |||
+ | ;data: 2044 bytes | ||
+ | ;zero page ram required: minimum 8 bytes, ideally 14 | ||
+ | ;do_add: 30 bytes in zp, if used | ||
+ | ;time: 196 cycles, option for 192 if you use 30 more zp bytes for do_add | ||
+ | ; | ||
+ | |||
+ | ;How to use: | ||
+ | ;put numbers in x/y and result is Y reg, X reg, z1, z0 | ||
;tables of squares | ;tables of squares | ||
Line 35: | Line 47: | ||
y0=$fd; | y0=$fd; | ||
y1=$fe | y1=$fe | ||
- | z0=$80; | + | z0=$80; |
z1=$81 | z1=$81 | ||
- | z2=$82 | + | z2=$82 |
- | z3=$83 | + | z3=$83 |
;Example showing use | ;Example showing use | ||
Line 46: | Line 58: | ||
sta y0 | sta y0 | ||
sta y1 | sta y1 | ||
- | jsr maketables | + | jsr makesqrtables |
jsr umult16 | jsr umult16 | ||
+ | stx z2 | ||
+ | sty z3 | ||
;result should be $fffe0001, e.g. as viewed with a typical m 0080 monitor command: | ;result should be $fffe0001, e.g. as viewed with a typical m 0080 monitor command: | ||
;0080 01 00 fe ff | ;0080 01 00 fe ff | ||
Line 110: | Line 124: | ||
sta p_invsqr_hi; | sta p_invsqr_hi; | ||
- | ldy y0 | ||
sec | sec | ||
+ | ldy y0 | ||
lda (p_sqr_lo), | lda (p_sqr_lo), | ||
- | sbc (p_invsqr_lo), | + | sbc (p_invsqr_lo), |
sta z0;x0*y0l | sta z0;x0*y0l | ||
lda (p_sqr_hi), | lda (p_sqr_hi), | ||
sbc (p_invsqr_hi), | sbc (p_invsqr_hi), | ||
- | sta c1a+1; | + | sta c1a+1; |
;c1a means column 1, row a (partial product to be added later) | ;c1a means column 1, row a (partial product to be added later) | ||
ldy y1 | ldy y1 | ||
- | ;sec ;notice that the high byte of sub above is always | + | ;sec ;notice that the high byte of subtraction |
lda (p_sqr_lo), | lda (p_sqr_lo), | ||
sbc (p_invsqr_lo), | sbc (p_invsqr_lo), | ||
Line 127: | Line 141: | ||
lda (p_sqr_hi), | lda (p_sqr_hi), | ||
sbc (p_invsqr_hi), | sbc (p_invsqr_hi), | ||
- | sta c2a+1; | + | sta c2a+1; |
;set multiplier as x1 | ;set multiplier as x1 | ||
Line 144: | Line 158: | ||
lda (p_sqr_hi), | lda (p_sqr_hi), | ||
sbc (p_invsqr_hi), | sbc (p_invsqr_hi), | ||
- | sta c2b+1; | + | sta c2b+1; |
ldy y1 | ldy y1 | ||
Line 153: | Line 167: | ||
lda (p_sqr_hi), | lda (p_sqr_hi), | ||
sbc (p_invsqr_hi), | sbc (p_invsqr_hi), | ||
- | tay; | + | tay; |
- | ;17+33+31+17+31+30=159 | + | ;17+34+33+17+33+31=164.97 cycles for main multiply part (minimum=157, |
+ | ;jmp do_adds; can put do_adds in zp for a slight speed increase | ||
do_adds: | do_adds: | ||
;-add the first two numbers of column 1 | ;-add the first two numbers of column 1 | ||
Line 188: | Line 203: | ||
Diagram of the additions | Diagram of the additions | ||
- | | + | |
- | | + | |
- | --------- | + | -------- |
x0y0h x0y0l | x0y0h x0y0l | ||
+ x0y1h x0y1l | + x0y1h x0y1l |
base/fastest_multiplication.txt · Last modified: 2024-02-13 08:24 by repose