base:fast_sqrt

# Differences

This shows you the differences between two versions of the page.

 base:fast_sqrt [2015-04-17 04:31]127.0.0.1 external edit base:fast_sqrt [2019-08-18 20:28] (current)verz 2019-08-18 20:28 verz 2019-08-18 10:20 verz 2019-08-04 02:03 verz 2019-07-27 21:28 verz 2019-07-27 18:20 verz 2019-07-24 00:43 verz 2019-07-23 00:16 verz 2015-04-17 04:31 external edit Next revision Previous revision 2019-08-18 20:28 verz 2019-08-18 10:20 verz 2019-08-04 02:03 verz 2019-07-27 21:28 verz 2019-07-27 18:20 verz 2019-07-24 00:43 verz 2019-07-23 00:16 verz 2015-04-17 04:31 external edit Line 374: Line 374: - the next one is from: http://www.geocities.com/oneelkruns/asm1step.html + ---- - Returns the 8-bit square root in \$20 of the + by Verz: the generic algorithm is: - 16-bit number in \$20 (low) and \$21 (high). The + - remainder is in location \$21. + - sqrt16 + R= 0 - LDY #\$01 ; lsby of first odd number = 1 + M= N - STY \$22 + D= 2^(p-1) - DEY + for n= 1 to p - STY \$23 ; msby of first odd number (sqrt = 0) + { - again + T= (R+R+D) ASL (p-1) - SEC + if (T <= M) then M=M-T: R=R+D - LDA \$20 ; save remainder in X register + M= M ASL 1 - TAX ; subtract odd lo from integer lo + D= D LSR 1 - SBC \$22 + } - STA \$20 + - LDA \$21 ; subtract odd hi from integer hi + - SBC \$23 + - STA \$21 ; is subtract result negative? + - BCC nomore ; no. increment square root + - INY + - LDA \$22 ; calculate next odd number + - ADC #\$01 + - STA \$22 + - BCC again + - INC \$23 + - JMP again + - nomore + - STY \$20 ; all done, store square root + - STX \$21 ; and remainder + - RTS + + where //p// is the number of bits of the result; (or half the bits of the radicand, +1 if the radicand has an odd amount of bits).\\ + //p=ceil(#bits-of-radicand/2)//\\ + + Incidentally the algorithm provides also the remainder via (M LSR p)\\ + //p// can be greater than necessary, just producing more iterations. A value of p=16 permits to perform the calculations for 32bit numbers and smaller. + + This is the code for a 32bit integer Sqrt. Provides the result and the remainder: + + ;******************************************** + ;*    sqrt32 + ;* + ;*   computes Sqrt of a 32bit number + ;******************************************** + ;*   by Verz - Jul2019 + ;******************************************** + ;* + ;*  input:  square, 32bit source number + ;*  output: sqrt,   16bit value + ;*          remnd,  17bit value + ;******************************************** + + sqrt32  lda #0 + sta sqrt        ; R=0 + sta sqrt+1 + sta M+4 + ;sta T+1        ; (T+1) is zero until last iteration; (T+0) is always 0 + + clc + ldy #14         ; 15 iterations (14-->0) + last iteration + loopsq + lda sqrt        ; (2*R+D) LSR 1; actually: R+(D LSR 1) + ora stablo,y    ; using ORA instead of ADC is ok because the bit to be set + sta T+2         ;    will have not been affected yet + lda sqrt+1 + ora stabhi,y + sta T+3 + bcs skip0       ; takes care of large numbers; if set, M>T + + lda M+3 + cmp T+3 + bcc skip1       ; T <= M    (branch if T>M) + bne skip0 + lda M+2 + cmp T+2 + bcc skip1 + skip0   ;sec + lda M+2         ; M=M-T + sbc T+2 + sta M+2 + lda M+3 + sbc T+3 + sta M+3 + lda sqrt        ; R=R+D + ora stablo+1,y + sta sqrt + lda sqrt+1 + ora stabhi+1,y + sta sqrt+1 + skip1 + asl M           ; M=M*2 + rol M+1 + rol M+2 + rol M+3 + dey             ; implicit: D=D/2, by the decrement of .Y + bpl loopsq + lastiter                ; code for last iteration + bcs skp0        ; takes care of large numbers; if set, M>T + ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1 + lda M+3 + cmp sqrt+1      ; (T+3) = sqrtHI + bcc skp1        ; T <= M    branch if T>M + bne skp0 + lda M+2 + cmp sqrt        ; (T+2) = sqrtLO + bcc skp1 + bne skp0 + lda M+1 + cmp #\$80        ; value of (T+1) during last iteration + bcc skp1 + skp0    ;sec + lda M+1 + sbc #\$80        ; (T+1) during last iteration + sta M+1 + lda M+2         ; M=M-T + sbc sqrt        ; (T+2) + sta M+2 + lda M+3 + sbc sqrt+1      ; (T+3) + sta M+3 + inc sqrt        ; R=R+D with D=1 + skp1    asl M+1         ; M=M*2; location M+0=0 + rol M+2 + rol M+3 + rol M+4 + rts + + ;**** Variables and Shift table + stabhi byte 0,0,0,0,0,0,0,0 + stablo BYTE \$01,\$02,\$04,\$08,\$10,\$20,\$40,\$80 + byte 0,0,0,0,0,0,0,0 + + square = \$57    ; 5 bytes: input value; during calculation needs the 5th byte + sqrt   = \$5F    ; 2 bytes: result + remnd  = M+2    ; 2 B + 1 b: is in the high bytes of M (M LSR 16); msb is in T+0 (the 5th byte of square) + T      = \$5B    ; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration + M      = square ; 4 bytes: over the input square + + The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms (using variables in page zero).\\ 