User Tools

Site Tools


base:fast_sqrt

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
base:fast_sqrt [2019-07-27 18:20]
verz
base:fast_sqrt [2019-08-18 20:28] (current)
verz
Line 390: Line 390:
         }         }
 </​code>​ </​code>​
-where //p// is the number of bits of the result; or half the bits of the radicand ​(+1 if the radicand has an odd amount of bits).\\ ​+where //p// is the number of bits of the result; ​(or half the bits of the radicand+1 if the radicand has an odd amount of bits).\\ ​
 //​p=ceil(#​bits-of-radicand/​2)//​\\ ​ //​p=ceil(#​bits-of-radicand/​2)//​\\ ​
  
 Incidentally the algorithm provides also the remainder via (M LSR p)\\  Incidentally the algorithm provides also the remainder via (M LSR p)\\ 
-//p// can be greater than necessary, just producing more iterations. A value of p=16 can be choosed ​to perform the calculations for 32bit numbers and smaller.+//p// can be greater than necessary, just producing more iterations. A value of p=16 permits ​to perform the calculations for 32bit numbers and smaller.
  
 This is the code for a 32bit integer Sqrt. Provides the result and the remainder: This is the code for a 32bit integer Sqrt. Provides the result and the remainder:
-<​code>​+<​code ​6502acme>
 ;​******************************************** ;​********************************************
 ;*    sqrt32 ;*    sqrt32
 ;* ;*
 ;*   ​computes Sqrt of a 32bit number ;*   ​computes Sqrt of a 32bit number
 +;​********************************************
 +;*   by Verz - Jul2019
 +;​********************************************
 ;* ;*
-;*      input: ​ square, ​the 4-byte ​source number +;*  input: ​ square, ​32bit source number 
-;*      output: sqrt, 16bit value +;*  output: sqrt,   ​16bit value 
-;*              remnd, ​16bit value +;*          remnd, ​ ​17bit ​value 
-;*+;********************************************
  
 sqrt32 ​ lda #0 sqrt32 ​ lda #0
         sta sqrt        ; R=0         sta sqrt        ; R=0
         sta sqrt+1         sta sqrt+1
 +        sta M+4
         ;sta T+1        ; (T+1) is zero until last iteration; (T+0) is always 0         ;sta T+1        ; (T+1) is zero until last iteration; (T+0) is always 0
  
 +        clc
         ldy #14         ; 15 iterations (14-->0) + last iteration         ldy #14         ; 15 iterations (14-->0) + last iteration
-loopsq  ​clc +loopsq ​  
-        lda stablo,​y ​   ​; (2*R+D) LSR 1; actually: R+(D LSR 1) +        lda sqrt        ​; (2*R+D) LSR 1; actually: R+(D LSR 1) 
-        ​adc sqrt +        ​ora stablo,​y ​   ; using ORA instead of ADC is ok because the bit to be set 
-        sta T+2 +        sta T+2         ;    will have not been affected yet 
-        lda stabhi,y +        lda sqrt+1 
-        adc sqrt+1+        ora stabhi,y
         sta T+3         sta T+3
 +        bcs skip0       ; takes care of large numbers; if set, M>T
         ​         ​
         lda M+3         lda M+3
Line 436: Line 442:
         sbc T+3         sbc T+3
         sta M+3         sta M+3
-        clc             ; possibly unnecessary 
         lda sqrt        ; R=R+D         lda sqrt        ; R=R+D
-        ​adc stablo+1,y+        ​ora stablo+1,y
         sta sqrt         sta sqrt
         lda sqrt+1         lda sqrt+1
-        ​adc stabhi+1,y+        ​ora stabhi+1,y
         sta sqrt+1         sta sqrt+1
 skip1 skip1
Line 451: Line 456:
         bpl loopsq         bpl loopsq
 lastiter ​               ; code for last iteration lastiter ​               ; code for last iteration
 +        bcs skp0        ; takes care of large numbers; if set, M>T
         ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1         ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1
         lda M+3         lda M+3
Line 474: Line 480:
         sta M+3         sta M+3
         inc sqrt        ; R=R+D with D=1         inc sqrt        ; R=R+D with D=1
-        bne skp1 +skp1    asl M+1         ; M=M*2; location ​M+0=0
-        inc sqrt+1 +
-skp1    asl M           ​; M=M*2 +
-        rol M+1+
         rol M+2         rol M+2
         rol M+3         rol M+3
 +        rol M+4
         rts         rts
  
 ;**** Variables and Shift table ;**** Variables and Shift table
-       byte 0 
 stabhi byte 0,​0,​0,​0,​0,​0,​0,​0 stabhi byte 0,​0,​0,​0,​0,​0,​0,​0
 stablo BYTE $01,​$02,​$04,​$08,​$10,​$20,​$40,​$80 stablo BYTE $01,​$02,​$04,​$08,​$10,​$20,​$40,​$80
        byte 0,​0,​0,​0,​0,​0,​0,​0        byte 0,​0,​0,​0,​0,​0,​0,​0
  
-square ​ = $5B     bytes: input value +square = $57    ​bytes: input value; during calculation needs the 5th byte 
-sqrt    = $5F     ​; 2 bytes: result +sqrt   ​= $5F    ; 2 bytes: result 
-remnd   ​= M+2     ​; 2 bytes: is in fact the high bytes of M (M LSR 16) +remnd  = M+2    ; 2 B + 1 b: is in the high bytes of M (M LSR 16); msb is in T+0 (the 5th byte of square
-      ​= $57     ; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration +     = $5B    ​; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration 
-      ​$5B     ; 4 bytes: over the input square+     square ​; 4 bytes: over the input square
 </​code>​ </​code>​
-The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms with variables in page zero.\\ ​ +The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms (using ​variables in page zero).\\ 
-I think it could be convenient to add a control to check whether M=0 and exit earlier.+
  
base/fast_sqrt.1564244423.txt.gz · Last modified: 2019-07-27 18:20 by verz