base:mathematics_in_assembly_part_6

This shows you the differences between two versions of the page.

— |
base:mathematics_in_assembly_part_6 [2015-04-17 04:32] (current) |
||
---|---|---|---|

Line 1: | Line 1: | ||

+ | ====== Mathematics in Assembly - Part 6 ====== | ||

+ | |||

+ | by Krill/Plush | ||

+ | |||

+ | Alright, now the multiplication for the last time. The approach discussed this time is the fastest one known to me, as one can perform a multiplication using this one in 15 clock cycles. Doesn' | ||

+ | |||

+ | Here we go. | ||

+ | |||

+ | But at first the snag: having the speed mentioned before is only 8 bits wide, relatively inaccurate and a table is needed that's growing quite fast when enlarging the range of calculation or accuracy. But the ratio of accuracy and memory expense can be changed arbitrarily, | ||

+ | |||

+ | < | ||

+ | | ||

+ | </ | ||

+ | |||

+ | Spinning the wheel a little further: | ||

+ | |||

+ | < | ||

+ | | ||

+ | </ | ||

+ | |||

+ | In plain english, we calculate the logarithms to an arbitrary base of both factors, add them, raise the base with that sumand get our product. Sounds quite complicated, | ||

+ | |||

+ | ====== Building the tables ====== | ||

+ | |||

+ | As the base is arbitrary, one should choose the 2, as it's a good choice with computers and their binary number system, supplying the best ratio of accuracy and table size. With multiplication, | ||

+ | |||

+ | There is a specific value, the integer factor, which is to be " | ||

+ | |||

+ | So at first we need a table holding the binary algorith for arguments from 1 to 255. The zero is ignored here as its logarithm is not defined. So the table values range from 0 (log]2[ (1)) to 7.99435344 (log]2[ (255)). To store these numbers rounded would be very inaccurate, so they first need to be scaled to a larger number range. As the logarithm values have a width of 8 bits, a factor scaling the biggest one to exactly 255 would be just right. This value, let's call it " | ||

+ | |||

+ | < | ||

+ | f = 255/log[2] (255). | ||

+ | </ | ||

+ | |||

+ | Sounds logical. Now only the logarithm table needs to be calculated in BASIC: | ||

+ | |||

+ | < | ||

+ | 1 for x=1 to 255 | ||

+ | 2 y= f*log(x)/ | ||

+ | 3 poke log2tab+x,y | ||

+ | 4 next | ||

+ | </ | ||

+ | |||

+ | In line 2, the value is calculated. As BASIC only calculates the natural algorithm, it first needs to be transformed to the binary one (log(x)/ | ||

+ | |||

+ | < | ||

+ | 1 for x= 0 to 510 | ||

+ | 2 y=2^(x/ | ||

+ | 3 poke pow2tab+x,y | ||

+ | 4 next | ||

+ | </ | ||

+ | |||

+ | The -8 in line 2 is caused by the fact that the result is divided by 256 (i.e. multiplied with 2^-8), as the second factor only ranges from 0 to 255/256, so its bits having a valency decreased by 8 each(2^-8 to 2^-1). | ||

+ | |||

+ | ====== The Routine ====== | ||

+ | |||

+ | Now we all needed tables in the memory and have read them out accordingly: | ||

+ | |||

+ | < | ||

+ | LDA log2tab,x | ||

+ | STA getresult+1 | ||

+ | LDX log2tab,y | ||

+ | getresult | ||

+ | </ | ||

+ | |||

+ | Before execution, the x- and y-registers hold the two factors, after execution, the accu contains the result. This routine needs just 16 cycles to execute, or just 15 if it's running in the zero-page. Still, the problem with zero is still there, so if one of both factors is zero, that needs to be noticed and acted accordingly, | ||

+ | |||

+ | ====== The problem with accuracy ====== | ||

+ | |||

+ | Now this routine works quite well but for operations like for instance $70 * $80/256 it doesn' | ||

+ | |||

+ | < | ||

+ | 1 for x= 1 to 255 | ||

+ | 2 y= 256*log(x)/ | ||

+ | 3 poke log2tab1+x, | ||

+ | 4 poke log2tbh0+x, | ||

+ | 5 poke log2tbh1+x, | ||

+ | 6 next | ||

+ | </ | ||

+ | |||

+ | I'll explain later on why there are two hibyte tables generated here. Good, now for the power table which is now 2*$07FF+1=$0FFF (4095) bytes long: | ||

+ | |||

+ | < | ||

+ | 1 for x=0 to 4094 | ||

+ | 2 y=2^(x/ | ||

+ | 3 poke pow2tab+x,y | ||

+ | 4 next | ||

+ | </ | ||

+ | |||

+ | Now the actual calculation routine looks a little different: | ||

+ | < | ||

+ | CLC | ||

+ | LDA log2tab1,x | ||

+ | ADC log2tab1,y | ||

+ | STA getresult+1 | ||

+ | LDA log2tbh0,x | ||

+ | ADC log2tbh1,y | ||

+ | STA getresult+2 | ||

+ | getresult LDA pow2tab | ||

+ | </ | ||

+ | |||

+ | This routine needs 30 cycles to execute or 28 in the zeropage. Now it's getting clear why there are needed two hibyte tables: as the power table is not starting at $0 in the memory, an offset needs to be added for the table beginning. For that, the table needs to be located at an even address (Lobyte $00). If it's located at an address which is divideable by $0200 without a remainder, one can get rid of the second hibyte table and simply add an offset to the remaining hibyte table, which is of the size of the power tables hibyte divided by 2. | ||

+ | |||

+ | Alright, now only some notes: signed numbers are treated as usual (buffer the signs, calculate using the numbers' | ||

+ | |||

+ | ====== Division ====== | ||

+ | |||

+ | Yes, that's finally it for the multiplication. From the next issue of this tutorial forward I'll continue with the division. The approaches for that are quite similar to the ones for multiplication, | ||

+ | |||

+ | So just look out. | ||

+ | |||

+ | See you! | ||

+ | |||

+ | Krill/Plush | ||

base/mathematics_in_assembly_part_6.txt ยท Last modified: 2015-04-17 04:32 (external edit)