User Tools

Site Tools


base:fastest_multiplication

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
base:fastest_multiplication [2017-04-17 13:33] – Added heading and credits at start of the article for standardization across the wiki. ftcbase:fastest_multiplication [2017-04-19 09:59] repose
Line 7: Line 7:
 Jack Asser's: 233 cycles  ref: [[seriously_fast_multiplication]] \\ Jack Asser's: 233 cycles  ref: [[seriously_fast_multiplication]] \\
 Chris Jam's: 204.5  ref: http://csdb.dk/forums/?roomid=11&topicid=91766 \\ Chris Jam's: 204.5  ref: http://csdb.dk/forums/?roomid=11&topicid=91766 \\
-Mine: 201 zp variation: 198 \\+Mine: 196 zp variation: 192 \\
 Times above need to add 12 for jsr/rts \\ Times above need to add 12 for jsr/rts \\
  
Line 15: Line 15:
 ;and being less elegant and harder to follow. ;and being less elegant and harder to follow.
 ;by Repose 2017 ;by Repose 2017
 +;table generator by Graham
 +;addition improvement suggested by JackAsser
 +
 +;data: 2044 bytes
 +;zero page ram required: minimum 8 bytes, ideally 14
 +;do_add: 30 bytes in zp, if used
 +;time: 196 cycles, option for 192 if you use 30 more zp bytes for do_add
 +;measurement method: branches/page boundary crossings were averaged
 +
 +;How to use:
 +;put numbers in x/y and result is Y reg, X reg, z1, z0
  
 ;tables of squares ;tables of squares
Line 35: Line 46:
 y0=$fd;multiplicand, 2 bytes y0=$fd;multiplicand, 2 bytes
 y1=$fe y1=$fe
-z0=$80;product, bytes+z0=$80;product, bytes
 z1=$81 z1=$81
-z2=$82 +;z2=$82 returned in X reg 
-z3=$83+;z3=$83 returned in Y reg
  
-;not shown is routine to make the tables +;Example showing use 
-;also you need to init the pointers' high bytes to the tables+lda #$ff 
 +sta x0 
 +sta x1 
 +sta y0 
 +sta y1 
 +jsr maketables 
 +jsr umult16 
 +;result should be $fffe0001, e.g. as viewed with typical m 0080 monitor command: 
 +;0080 01 00 fe ff 
 + 
 +makesqrtables: 
 +;init zp square tables pointers 
 +lda #>sqrlo 
 +sta p_sqr_lo+1 
 +lda #>sqrhi 
 +sta p_sqr_hi+1 
 +lda #>negsqrlo 
 +sta p_invsqr_lo+1 
 +lda #>negsqrhi 
 +sta p_invsqr_hi+1 
 + 
 +;generate sqr(x)=x^2/
 +      ldx #$00 
 +      txa 
 +      !by $c9   ; CMP #immediate - skip TYA and clear carry flag 
 +lb1:  tya 
 +      adc #$00 
 +ml1:  sta sqrhi,x 
 +      tay 
 +      cmp #$40 
 +      txa 
 +      ror 
 +ml9:  adc #$00 
 +      sta ml9+1 
 +      inx 
 +ml0:  sta sqrlo,x 
 +      bne lb1 
 +      inc ml0+2 
 +      inc ml1+2 
 +      clc 
 +      iny 
 +      bne lb1 
 + 
 +;generate negsqr(x)=(255-x)^2/
 +      ldx #$00 
 +      ldy #$ff 
 +mt1: 
 +      lda sqrhi+1,x 
 +      sta negsqrhi+$100,
 +      lda sqrhi,x 
 +      sta negsqrhi,
 +      lda sqrlo+1,x 
 +      sta negsqrlo+$100,
 +      lda sqrlo,x 
 +      sta negsqrlo,
 +      dey 
 +      inx 
 +      bne mt1 
 +      rts
  
 umult16: umult16:
Line 95: Line 164:
 lda (p_sqr_hi),y lda (p_sqr_hi),y
 sbc (p_invsqr_hi),y sbc (p_invsqr_hi),y
-sta z3;x1*y1h;31+tay;x1*y1h;Y=z3, 30 cycles 
 +;17+33+31+17+31+30=159 cycles for main multiply part
  
-;4*31+2*17 so far=158 +;jmp do_adds; can put do_adds in zp for a slight speed increase
-;add partials +
-;-add first two numbers in column 1 +
-;jmp do_adds;put in zp to save 3 cycles :)+
 do_adds: do_adds:
-clc +;-add the first two numbers of column 1 
-c1a lda #0  + clc 
-c1b adc #0 +c1a: lda #0 
-sta z1;9 +c1b: adc #0 
-;-continue to first two numbers in column 2                    + sta z1;9 
-c2a lda #0 + 
-c2b adc #0  +;-continue to first two numbers of column 2 
-sta z2;7 +c2a: lda #0 
-bcc c1c;3 taken/9 not taken, avg 6  +c2b: adc #0 
-inc z3 + tax;X=z2, 6 cycles 
-clc + bcc c1c;3/avg 4.5 
-;-add last number of column 1 (row c) + iny;z3++ 
-c1c lda #0  + clc 
-adc z1 + 
-sta z1;8+;-add last number of column 1 
 +c1c: lda #0 
 + adc z1 
 + sta z1;8 
 ;-add last number of column 2 ;-add last number of column 2
-c2c lda #0  + txa;A=z2 
-adc z2 +c2c: adc #0 
-sta z2;8 + tax;X=z2, 6 
-bcc fin;3/avg 5 + bcc fin;3/avg 3.
-inc z3 + iny;z3++ 
-;9+7+6+8+8+5=43 + 
-fin rts+;Y=z3, X=z2 
 +;9+6+4.5+8+6+3.5=37 
 +fin: rts
  
 Diagram of the additions Diagram of the additions
base/fastest_multiplication.txt · Last modified: 2024-02-13 08:24 by repose