base:multiplication_and_division
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | base:multiplication_and_division [2015-04-17 04:33] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Multiplication ====== | ||
+ | The multiplication of two numbers can be done in various forms. The most common methods are " | ||
+ | |||
+ | |||
+ | |||
+ | ===== " | ||
+ | |||
+ | The most simple and intuitive form of multiplication: | ||
+ | |||
+ | < | ||
+ | a = $02 ; use zeropage adresses, they are faster | ||
+ | b = $03 | ||
+ | result = $04 | ||
+ | |||
+ | lda a | ||
+ | sta mod+1 ; modify code, this way we can use an immediate adc-command | ||
+ | lda #$00 | ||
+ | tay ; initialisation of result: accu is lowbyte, and y-register is highbyte | ||
+ | ldx b | ||
+ | inx | ||
+ | |||
+ | loop1 clc | ||
+ | loop2 dex | ||
+ | beq end | ||
+ | mod adc #$00 ; becomes modified -> adc a | ||
+ | bcc loop2 | ||
+ | iny | ||
+ | bne loop1 | ||
+ | end sta result | ||
+ | sty result+1 | ||
+ | rts | ||
+ | </ | ||
+ | After all you better forget this method. | ||
+ | |||
+ | |||
+ | |||
+ | ===== bit-shifting ===== | ||
+ | |||
+ | In the decimalsystem it is easy to multiply a number by 10. The only thing we have to do is to " | ||
+ | |||
+ | Now remember what you've learned in school: To multiply two large numbers you split them up to one multiplication for every decimal place and add the results. For an example, the equation 213 * 54 can be splitted up to: | ||
+ | | ||
+ | 3 * 54 | + | ||
+ | + 10 * 54 | + | ||
+ | + 200 * 54 | 10, | ||
+ | ----------------- | --------------- | ||
+ | 11, | ||
+ | |||
+ | Of course the same method can be used in the binarysystem. Instead of looking at each decimal place we look at each binary place, treating every bit from the first multiplier seperatly. Fortunatly we get only multiplications by a power of two this way. And, as described above, these can simply be done by leftshifts. Using our example again (213 = %1101 0101, 54 = %0011 0110) we get: | ||
+ | |||
+ | 0000 0001 * 0011 0110 | .... .... 0011 0110 | ||
+ | + 0000 0100 * 0011 0110 | + .... ..00 1101 10.. | ||
+ | + 0001 0000 * 0011 0110 | + .... 0011 0110 .... | ||
+ | + 0100 0000 * 0011 0110 | + ..00 1101 10.. .... | ||
+ | + 1000 0000 * 0011 0110 | + .001 1011 0... .... | ||
+ | -------------------------------- | ---------------------------- | ||
+ | 0010 1100 1110 1110 | 0010 1100 1110 1110 | ||
+ | |||
+ | This leads us directly to an implementation in assembler: At first we set up a loop which runs 8 times. The bits of the first multiplier can be tested one after another by rotating them out of the multiplier into the carryflag. If the carry is set then the bit was ' | ||
+ | |||
+ | Well, that is the basic concept. The fastest implementation of this I know was written by Damon Slye and published in "Call APPLE" in June 1983. This routine uses the same principle, but is highly optimized. It takes 7 cycles in the best case, 144 cycles in the average case and 149 cycles in the worst case. | ||
+ | |||
+ | < | ||
+ | name: 8 bit multiplication, | ||
+ | call: accu: multiplier | ||
+ | x-register: | ||
+ | return: | ||
+ | |||
+ | multiplier = $02 ; some zeropage adress | ||
+ | |||
+ | multiply cpx #$00 | ||
+ | beq end | ||
+ | dex | ||
+ | stx mod+1 | ||
+ | lsr | ||
+ | sta multiplier | ||
+ | lda #$00 | ||
+ | ldx #$08 | ||
+ | loop bcc skip | ||
+ | mod adc #$00 | ||
+ | skip ror | ||
+ | ror multiplier | ||
+ | dex | ||
+ | bne loop | ||
+ | ldx multiplier | ||
+ | rts | ||
+ | end txa | ||
+ | rts | ||
+ | </ | ||
+ | |||
+ | ===== table based methods ===== | ||
+ | |||
+ | Of course the fastest way of doing a multiplication is - not to do it at all. At least not in realtime. Instead you can calculate all your possible results in advance and store them in the RAM. Now, whenever you need the result of a multiplication, | ||
+ | The major drawback of this method is that it eats a lot of memory; depending on your domain even more than 64kb. The multiplication of two 8-bit numbers for example has 65,536 results of 16 bits each, resulting in a table of 128kb length. Fortunatly we can use some simple math to cut the required memory down a bit (at the cost of speed). The ' | ||
+ | |||
+ | a*b = ( (a+b)/2 )^2 - ( (a-b)/2 )^2 | ||
+ | |||
+ | The two terms ( (a+b)/2 )< | ||
+ | |||
+ | The last thing we have to do is to proove the correctness of the magic formula. In fact it can be gained by some rearrangements of our multiplication: | ||
+ | |||
+ | ab = ab | *4 | ||
+ | 4ab = 2ab + 2ab | now lets add some ' | ||
+ | 4ab = 2ab + 2ab + aa - aa + bb - bb | by moving the terms around we get | ||
+ | 4ab = aa + 2ab + bb - aa + 2ab - bb | using brackets now, which toggles the signs | ||
+ | 4ab = (aa + 2ab + bb) - (aa - 2ab + bb) | using the binomial theorem gives us | ||
+ | 4ab = ( (a+b)*(a+b) ) - ( (a-b)*(a-b) ) | /4 | ||
+ | ab = ( (a+b)*(a+b) )/4 - ( (a-b)*(a-b) )/4 | which is the same as... | ||
+ | ab = ( ((a+b)/ | ||
+ | ab = ( (a+b)/2 )^2 - ( (a-b)/2 )^2 | there we are! | ||
+ | |||
+ | another way to prove the formula: | ||
+ | |||
+ | < | ||
+ | |||
+ | as we learned in school: | ||
+ | |||
+ | (x+y)*(x-y)=x*x-x*y+y*x-y*y | ||
+ | (x+y)*(x-y)=x^2-y^2 | ||
+ | |||
+ | let a=x+y | ||
+ | and b=x-y | ||
+ | |||
+ | then: | ||
+ | |||
+ | x=((x+y)+(x-y))/ | ||
+ | y=((x+y)-(x-y))/ | ||
+ | |||
+ | thus putting a and b into the original formula of: (x+y)*(x-y)=x^2-y^2 | ||
+ | |||
+ | leads to: | ||
+ | |||
+ | a*b = ((a+b)/2)^2 - ((a-b)/2)^2 | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===== using the floatingpoint-routines of the C64 kernal ===== | ||
+ | |||
+ | This is probably the slowest of all methods. Only usefull if you have to work with floatingpoint-numbers and don't care about speed. As the format of these numbers is a bit complicated it is not covered in this article (yet). |
base/multiplication_and_division.txt · Last modified: 2015-04-17 04:33 by 127.0.0.1