User Tools

Site Tools


base:fastest_multiplication

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
base:fastest_multiplication [2023-08-21 02:45] reposebase:fastest_multiplication [2023-12-11 12:20] repose
Line 5: Line 5:
 Requires tables or a generator routine such as [[table_generator_routine_for_fast_8_bit_mul_table]] Requires tables or a generator routine such as [[table_generator_routine_for_fast_8_bit_mul_table]]
  
-Jack Asser's: 233 cycles  ref: [[seriously_fast_multiplication]] \\ +Jack Asser's: 233cycles  ref: [[seriously_fast_multiplication]] \\ 
-Chris Jam's: 204.5  ref: http://csdb.dk/forums/?roomid=11&topicid=91766 \\ +Chris Jam's: 204.5 ref: http://csdb.dk/forums/?roomid=11&topicid=91766 \\ 
-Mine: 196 zp variation: 192 \\ +Mine: 198.6 zp variation: 194.6 \\ 
-Times above need to add 12 for jsr/rts \\ +Times above need to add for rts \\ 
-Note: updated 2023; corrected typos and timings \\+Note: updated 2023; corrected code, typos and timings \\ 
 +*Timing approximate for external code \\ 
 +This is the winner out of 120 published algorithms as independently tested here: 
 +https://github.com/TobyLobster/multiply_test \\ 
 +Note: I have a new version which is 188.1 cycles, excepting the RTS. \\
  
 <code> <code>
Line 22: Line 26:
 ;zero page ram required: minimum 8 bytes, ideally 14 ;zero page ram required: minimum 8 bytes, ideally 14
 ;do_add: 30 bytes in zp, if used ;do_add: 30 bytes in zp, if used
-;time: 196 cycles, option for 192 if you use 30 more zp bytes for do_add+;time: 198.6 cycles, option for 194.6 if you use 30 more zp bytes for do_add
 ;measurement method: average timings over all input combinations ;measurement method: average timings over all input combinations
  
 ;How to use: ;How to use:
-;put numbers in x/y and result is Y reg, reg, z1, z0+;put numbers in x/y and result is Y reg (z3)reg (z2), z1, z0
  
 ;tables of squares ;tables of squares
 ;sqr(x)=x^2/4 ;sqr(x)=x^2/4
 ;negsqr(x)=(255-x)^2/4 ;negsqr(x)=(255-x)^2/4
-sqrlo=$c000;511 bytes +sqrlo=$c100;511 bytes 
-sqrhi=$c200;511 bytes +sqrhi=$c300;511 bytes 
-negsqrlo=$c400;511 bytes +negsqrlo=$c500;511 bytes 
-negsqrhi=$c600;511 bytes+negsqrhi=$c700;511 bytes
  
 ;pointers to square tables above ;pointers to square tables above
Line 49: Line 53:
 z0=$80;product, 2 bytes z0=$80;product, 2 bytes
 z1=$81 z1=$81
-z2=$82 ;returned in reg+z2=$82 ;returned in reg
 z3=$83 ;returned in Y reg z3=$83 ;returned in Y reg
  
 ;Example showing use ;Example showing use
 +*=$c000
 lda #$ff lda #$ff
 sta x0 sta x0
Line 60: Line 65:
 jsr makesqrtables jsr makesqrtables
 jsr umult16 jsr umult16
-stx z2+sta z2
 sty z3 sty z3
 +rts
 ;result should be $fffe0001, e.g. as viewed with a typical m 0080 monitor command: ;result should be $fffe0001, e.g. as viewed with a typical m 0080 monitor command:
 ;0080 01 00 fe ff ;0080 01 00 fe ff
Line 158: Line 164:
 lda (p_sqr_hi),y lda (p_sqr_hi),y
 sbc (p_invsqr_hi),y sbc (p_invsqr_hi),y
-sta c2b+1;x1*y1h;32.992+sta c2b+1;x1*y0h;32.992
  
 ldy y1 ldy y1
Line 167: Line 173:
 lda (p_sqr_hi),y lda (p_sqr_hi),y
 sbc (p_invsqr_hi),y sbc (p_invsqr_hi),y
-tay;x1*y1h;Y=z3, 30.992 cycles+tay;Y=x1*y1h, 30.992 cycles
 ;17+34+33+17+33+31=164.97 cycles for main multiply part (minimum=157, maximum=173) ;17+34+33+17+33+31=164.97 cycles for main multiply part (minimum=157, maximum=173)
  
Line 181: Line 187:
 c2a: lda #0 c2a: lda #0
 c2b: adc #0 c2b: adc #0
- tax;X=z2, 6 cycles + tax 
- bcc c1c;3/6 avg 4.5+ bcc c1c;9
  iny;z3++  iny;z3++
- clc+ clc;(+3) taken 7% of the time, 3*.07=+.21
  
 ;-add last number of column 1 ;-add last number of column 1
Line 192: Line 198:
  
 ;-add last number of column 2 ;-add last number of column 2
- txa;A=z2+ txa
 c2c: adc #0 c2c: adc #0
- tax;X=z2, 6 + ;A=z2 
- bcc fin;3/4 avg 3.5 + bcc fin;7 
- iny;z3+++ iny;(+1) taken 42% of the time, 1*.42=.42
  
-;Y=z3, X=z2 +;Y=z3, A=z2 
-;9+6+4.5+8+6+3.5=37 +;add partials part total cycles=33.63 (minimum=33, maximum=37) 
-fin: rts+;total time=164.97+33.63=198.6 
 +fin: rts;add 6 to include this (204.6)
  
 Diagram of the additions Diagram of the additions
base/fastest_multiplication.txt · Last modified: 2024-02-13 08:24 by repose