User Tools

Site Tools


base:fast_sqrt

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
base:fast_sqrt [2015-04-17 04:31]
127.0.0.1 external edit
base:fast_sqrt [2019-08-18 20:28] (current)
verz
Line 374: Line 374:
  
  
-the next one is from: http://​www.geocities.com/​oneelkruns/​asm1step.html+----
  
-Returns ​the 8-bit square root in $20 of the +by Verz: the generic algorithm ​is:
-16-bit number in $20 (low) and $21 (high). The +
-remainder ​is in location $21.+
  
 <​code>​ <​code>​
-sqrt16 +        R= 0 
- LDY #$01 ; lsby of first odd number ​= 1 +        M
- STY $22 +        D= 2^(p-1) 
- DEY +        for n= 1 to p 
- STY $23 ; msby of first odd number ​(sqrt 0+        { 
-again +            T= (R+R+D) ASL (p-1) 
- SEC +            ​if ​(T <Mthen M=M-T: R=R+D 
- LDA $20 ; save remainder in X register +            M= M ASL 1 
- TAX ; subtract odd lo from integer lo +            D= D LSR 1 
- SBC $22 +        }
- STA $20 +
- LDA $21 ; subtract odd hi from integer hi +
- SBC $23 +
- STA $21 ; is subtract result negative? +
- BCC nomore ; no. increment square root +
- INY +
- LDA $22 ; calculate next odd number +
- ADC #$01 +
- STA $22 +
- BCC again +
- INC $23 +
- JMP again +
-nomore +
- STY $20 ; all done, store square root +
- STX $21 ; and remainder +
- RTS+
 </​code>​ </​code>​
 +where //p// is the number of bits of the result; (or half the bits of the radicand, +1 if the radicand has an odd amount of bits).\\ ​
 +//​p=ceil(#​bits-of-radicand/​2)//​\\ ​
 +
 +Incidentally the algorithm provides also the remainder via (M LSR p)\\ 
 +//p// can be greater than necessary, just producing more iterations. A value of p=16 permits to perform the calculations for 32bit numbers and smaller.
 +
 +This is the code for a 32bit integer Sqrt. Provides the result and the remainder:
 +<code 6502acme>​
 +;​********************************************
 +;*    sqrt32
 +;*
 +;*   ​computes Sqrt of a 32bit number
 +;​********************************************
 +;*   by Verz - Jul2019
 +;​********************************************
 +;*
 +;*  input: ​ square, 32bit source number
 +;*  output: sqrt,   16bit value
 +;*          remnd, ​ 17bit value
 +;​********************************************
 +
 +sqrt32 ​ lda #0
 +        sta sqrt        ; R=0
 +        sta sqrt+1
 +        sta M+4
 +        ;sta T+1        ; (T+1) is zero until last iteration; (T+0) is always 0
 +
 +        clc
 +        ldy #14         ; 15 iterations (14-->0) + last iteration
 +loopsq  ​
 +        lda sqrt        ; (2*R+D) LSR 1; actually: R+(D LSR 1)
 +        ora stablo,​y ​   ; using ORA instead of ADC is ok because the bit to be set
 +        sta T+2         ; ​   will have not been affected yet
 +        lda sqrt+1
 +        ora stabhi,y
 +        sta T+3
 +        bcs skip0       ; takes care of large numbers; if set, M>T
 +        ​
 +        lda M+3
 +        cmp T+3
 +        bcc skip1       ; T <= M    (branch if T>M)
 +        bne skip0
 +        lda M+2
 +        cmp T+2
 +        bcc skip1
 +skip0   ;sec
 +        lda M+2         ; M=M-T
 +        sbc T+2
 +        sta M+2
 +        lda M+3
 +        sbc T+3
 +        sta M+3
 +        lda sqrt        ; R=R+D
 +        ora stablo+1,y
 +        sta sqrt
 +        lda sqrt+1
 +        ora stabhi+1,y
 +        sta sqrt+1
 +skip1
 +        asl M           ; M=M*2
 +        rol M+1 
 +        rol M+2
 +        rol M+3
 +        dey             ; implicit: D=D/2, by the decrement of .Y
 +        bpl loopsq
 +lastiter ​               ; code for last iteration
 +        bcs skp0        ; takes care of large numbers; if set, M>T
 +        ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1
 +        lda M+3
 +        cmp sqrt+1 ​     ; (T+3) = sqrtHI
 +        bcc skp1        ; T <= M    branch if T>M
 +        bne skp0
 +        lda M+2
 +        cmp sqrt        ; (T+2) = sqrtLO
 +        bcc skp1
 +        bne skp0
 +        lda M+1
 +        cmp #$80        ; value of (T+1) during last iteration
 +        bcc skp1
 +skp0    ;sec
 +        lda M+1
 +        sbc #$80        ; (T+1) during last iteration
 +        sta M+1
 +        lda M+2         ; M=M-T
 +        sbc sqrt        ; (T+2)
 +        sta M+2
 +        lda M+3
 +        sbc sqrt+1 ​     ; (T+3)
 +        sta M+3
 +        inc sqrt        ; R=R+D with D=1
 +skp1    asl M+1         ; M=M*2; location M+0=0
 +        rol M+2
 +        rol M+3
 +        rol M+4
 +        rts
 +
 +;**** Variables and Shift table
 +stabhi byte 0,​0,​0,​0,​0,​0,​0,​0
 +stablo BYTE $01,​$02,​$04,​$08,​$10,​$20,​$40,​$80
 +       byte 0,​0,​0,​0,​0,​0,​0,​0
 +
 +square = $57    ; 5 bytes: input value; during calculation needs the 5th byte
 +sqrt   = $5F    ; 2 bytes: result
 +remnd  = M+2    ; 2 B + 1 b: is in the high bytes of M (M LSR 16); msb is in T+0 (the 5th byte of square)
 +T      = $5B    ; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration
 +M      = square ; 4 bytes: over the input square
 +</​code>​
 +The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms (using variables in page zero).\\ ​
  
base/fast_sqrt.1429237909.txt.gz · Last modified: 2015-04-17 04:31 by 127.0.0.1