base:fast_sqrt

Differences

This shows you the differences between two versions of the page.

 base:fast_sqrt [2019-07-24 00:43]verz base:fast_sqrt [2019-08-18 20:28] (current)verz Both sides previous revision Previous revision 2019-08-18 20:28 verz 2019-08-18 10:20 verz 2019-08-04 02:03 verz 2019-07-27 21:28 verz 2019-07-27 18:20 verz 2019-07-24 00:43 verz 2019-07-23 00:16 verz 2015-04-17 04:31 external edit Next revision Previous revision 2019-08-18 20:28 verz 2019-08-18 10:20 verz 2019-08-04 02:03 verz 2019-07-27 21:28 verz 2019-07-27 18:20 verz 2019-07-24 00:43 verz 2019-07-23 00:16 verz 2015-04-17 04:31 external edit Line 374: Line 374: - I'm still working on it, anyway this is based on one of the partial algorithms, and computes Sqrt of a 32bit number. ​ + ---- - Put the number in _square and get the result in _sqrt and _remainder. + + by Verz: the generic algorithm is: <​code>​ <​code>​ - _square  ​= \$5B          ; 4 bytes: \$5b-\$5e; input number + R= 0 - _sqrt    ​= \$5F          ; 2 bytes: \$5f-\$60; result of the computation + M= N - _remainder ​= _M+2       ; 2 bytes: \$5d-\$5e; is in fact the high bytes of _M + D= 2^(p-1) - _T       = \$57          ; 4 bytes: \$57-\$5a + for n= 1 to p - _M       = \$5B          ​; 4 bytes: \$5b-\$5e, same as the input _square + { - _D       = \$61          ; 2 bytes: \$61-\$62 + T= (R+R+D) ASL (p-1) + if (T <= M) then M=M-T: R=R+D + M= M ASL 1 + D= D LSR 1 + } + ​ + where //p// is the number of bits of the result; (or half the bits of the radicand, +1 if the radicand has an odd amount of bits).\\ ​ + //p=ceil(#bits-of-radicand/​2)//​\\ ​ - sqrt32 + Incidentally the algorithm provides also the remainder via (M LSR p)\\ - ldx #0 + //p// can be greater than necessary, just producing more iterations. A value of p=16 permits to perform the calculations for 32bit numbers and smaller. - stx _sqrt               ;R + - stx _sqrt+1 + - stx _D + - ;stx _T         ; Tlo is always 0 + - ldx #\$80 + - stx _D+1 + - ldy #16 + This is the code for a 32bit integer Sqrt. Provides the result and the remainder:​ + ​ + ;​******************************************** + ;*    sqrt32 + ;* + ;*   ​computes Sqrt of a 32bit number + ;​******************************************** + ;*   by Verz - Jul2019 + ;​******************************************** + ;* + ;*  input: ​ square, 32bit source number + ;*  output: sqrt,   16bit value + ;*          remnd, ​ 17bit value + ;​******************************************** - _loopsq + sqrt32  ​lda #0 - ​lda _sqrt + ​sta sqrt        ; R=0 - ​asl + sta sqrt+1 - sta _T+2 + sta M+4 - lda _sqrt+1 + ​;sta T+1        ; (T+1) is zero until last iteration; (T+0) is always 0 - rol + - sta _T+3 + - clc + - lda _D + - adc _T+2 + - sta _T+2 + - lda _D+1 + - adc _T+3 + - lsr + - sta _T+3 + - lda _T+2 + - ror + - sta _T+2 + - lda #0 + - ;       sta _T          ; Tlo is always 0 + - ror + - sta _T+1 + - ​LDA _M+3 + ​clc - ​CMP _T+3 + ldy #14         ; 15 iterations (14-->0) + last iteration - ​BCC skip1       ; T <= M    branch if T>M + loopsq ​ + lda sqrt        ; (2*R+D) LSR 1; actually: R+(D LSR 1) + ora stablo,​y ​   ; using ORA instead of ADC is ok because the bit to be set + sta T+2         ; ​   will have not been affected yet + lda sqrt+1 + ora stabhi,y + sta T+3 + ​bcs skip0       ; takes care of large numbers; if set, M>T + + lda M+3 + cmp T+3 + ​bcc skip1       ; T <= M    ​(branch if T>M) bne skip0 bne skip0 - lda _M+2 + lda M+2 - cmp _T+2 + cmp T+2 bcc skip1 bcc skip1 - bne skip0 + skip0   ​;sec - lda _M+1 + lda M+2         ; M=M-T - cmp _T+1 + sbc T+2 - bcc skip1 + sta M+2 - ;        bne skip0      ;_Tlo is always 0 + lda M+3 - ;        lda _reM + sbc T+3 - ;        cmp _T + sta M+3 - ;        bcc skip1 + ​lda sqrt        ; R=R+D - + ​ora stablo+1,y - skip0   sec + sta sqrt - ;        ​lda ​_M         ​; Tlo is always 0 + lda sqrt+1 - ;        sbc _T + ​ora stabhi+1,y - ;        sta _M + sta sqrt+1 - lda _M+1        ​; M=M-T + - sbc _T+1 + - sta _M+1 + - lda _M+2 + - sbc _T+2 + - sta _M+2 + - lda _M+3 + - sbc _T+3 + - sta _M+3 + - ​clc            ; possibly unnecessary + - lda _sqrt      ​; R=R+D + - ​adc _D + - sta _sqrt + - lda _sqrt+1 + - ​adc _D+1 + - sta _sqrt+1 + skip1 skip1 - asl _M + asl M           ; M=M*2 - rol _M+1 + rol M+1 - rol _M+2 + rol M+2 - rol _M+3 + rol M+3 - ​lsr _D+1 + ​dey             ; implicit: D=D/2, by the decrement of .Y - ​ror _D + bpl loopsq - ​dey + lastiter ​               ; code for last iteration - ​beq _endsqrt + bcs skp0        ; takes care of large numbers; if set, M>T - ​jmp _loopsq + ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1 - _endsqrt + ​lda M+3 + ​cmp sqrt+1 ​     ; (T+3) = sqrtHI + ​bcc skp1        ; T <= M    branch if T>M + ​bne skp0 + lda M+2 + cmp sqrt        ; (T+2) = sqrtLO + bcc skp1 + bne skp0 + lda M+1 + cmp #\$80        ; value of (T+1) during last iteration + bcc skp1 + skp0    ;sec + lda M+1 + sbc #\$80        ; (T+1) during last iteration + sta M+1 + lda M+2         ; M=M-T + sbc sqrt        ; (T+2) + sta M+2 + lda M+3 + sbc sqrt+1 ​     ; (T+3) + sta M+3 + inc sqrt        ; R=R+D with D=1 + skp1    asl M+1         ; M=M*2; location M+0=0 + rol M+2 + rol M+3 + rol M+4 rts rts - ​ + ;**** Variables and Shift table + stabhi byte 0,​0,​0,​0,​0,​0,​0,​0 + stablo BYTE \$01,​\$02,​\$04,​\$08,​\$10,​\$20,​\$40,​\$80 + byte 0,​0,​0,​0,​0,​0,​0,​0 + + square = \$57    ; 5 bytes: input value; during calculation needs the 5th byte + sqrt   = \$5F    ; 2 bytes: result + remnd  = M+2    ; 2 B + 1 b: is in the high bytes of M (M LSR 16); msb is in T+0 (the 5th byte of square) + T      = \$5B    ; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration + M      = square ; 4 bytes: over the input square + ​ + The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms (using variables in page zero).\\ ​
base/fast_sqrt.1563921807.txt.gz · Last modified: 2019-07-24 00:43 by verz

Page Tools 