Differences

This shows you the differences between two versions of the page.

--- base:advanced_optimizing [2017-11-20 08:49] – [SHX/SHY] bitbreaker
+++ base:advanced_optimizing [2024-03-03 11:06] (current) – [ASR] bitbreaker
@@ Line 321: / Line 321: @@
 Further advantage of this method is, that we have an additional register free, as it is not used for an index anymore. But be aware! You have to take into account, that you have to store values top-down, as the stack-pointer decreases on every push. The advantage is, that if an interrupt occurs in between, it will not trash your values on the stack, as it pushes its 3 bytes (PC + Status) below your current position. All you need to take care of is, that you don't under-run the stack in case of an interrupt (needs 3 bytes, if you do a JSR in the interrupt-handler, another 2 bytes are needed per level), or trash still valid content in the upper part of the stack.
 For reading out your values from stack you can either use pla but much easier via e.g. lda $0100,x
+===== Counting with steps greater than 1 =====
+Later we will discover to do that also by SBX, but there's also another option to do that easily and being able to use LAX features for the index or even function that we walk along
+<code>
+count = $20
+           ldx #$00
+           ldy #$00
+-
+           stx count,y
+           iny
+           txa
+           sbx #-3
+           cpx #$60
+           bne -
+           ...
+.index     lax count
+           ...
+           do stuff with X and A
+           ...
+           inc .index + 1
+</code>
+As you see the inc .index + 1 will fetch the value from the next location in zeropage on the next turn Thus we have A and X increased by 3 on each round, all done in 9 cycles, and with the option of destroying x later on.
 ===== Counting bits =====
@@ Line 691: / Line 718: @@
 The advantage is, that you can move bits also across registers and are not restricted to the accumulator only.
+When shifting, we handle 9 bits, as the bit falling out at one edge of the byte will be the new carry, and the old carry will be shifted in. This will introduce a gap of one bit, when we wrap around bits:
+<code>
+        lda #%11111111
+        clc
+        rol
+        rol
+        ;-> A = %11111101
+        ;              ^
+        ;             gap :-(
+</code>
+To avoid this behavior there's several ways around it:
+<code>
+        lda #%11111111
+        asl
+        adc #0
+        ...
+        lda #%11111111
+        anc #$ff
+        rol
+        ...
+        lda #%11111111
+        cmp #$80
+        rol
+</code>
+This way bit 7 is copied to carry first and then shifted in on the right end again.
+If you deal with chars, you often need numbers divided by 8, this also includes numbers bigger than 8 bits, as the screen is 320 pixels wide. If you include clipping you might even span over a bigger range.
+An easy way to shift 11 bits to a final 8 bit results without having to deal with two different bytes being shifted independently, is the following:
+<code>
+        lda xhi        ;00000hhh
+        asr #$0f       ;000000hh h - might also be a lsr in case if no upper bits need to be clamped
+        ora xlo        ;lllll0hh h
+        ror            ;hlllll0h h
+        ror            ;hhlllll0 h
+        ror            ;hhhlllll 0
+</code>
+As the least significant 3 bits are lost during the shift anyway, we place the bits for the highbyte there and rotate them back in on the left side, so all we need to shift then is a single byte. To make the rotation work, the highbyte needs to be preshiftet by one before the lowbyte is merged in. The only prerequisite of this method is, that the lowbyte must have least significant three bits cleared.
 ====== Jumpcode ======
@@ Line 832: / Line 907: @@
 In the same way this method can also be used to set bits (for e.g. with adc #$81) or to toggle bits.
+When masking out bits, SAX or SBX is often a good choice.
+<code>
+       lax value
+       and #%11110000
+       sta highnibble
+</code>
+After this we need to restore from X to mask the lower bits, better then another lda value, but still.
+<code>
+       lda value
+       ldx #%11110000
+       sax highnibble
+</code>
+This looks already better, we have the original value still in A and can do another mask operation.
+<code>
+       lax value
+       eor #%000011111
+       sax highnibble
+</code>
+This looks even better, we can reuse X here and also A still contains the original bits, but in an inverted manner. So this opens up more options of reusing the original value at more than one register which gives potential for further savings.
+This was spotted in Krill's loader when doing lookups on the GCR tables, so thanks to Krill here :-)
 ====== Illegal opcodes ======
@@ Line 874: / Line 976: @@
 Actually you can use LAX also with an immediate value, but it behaves a bit unstable regarding the given immediate value. However when simply doing an LAX #$00 you are fine.
+lda $xxxx,y is not available as 8 bit version, so an lda $xx,y is not possible. With lax $xx,y there is howeever a way to imitate a lda $xx,y at the cost of destroying x.
 ===== SAX/SHA =====
@@ Line 954: / Line 1058: @@
 <code>
-        and #$ff
+        and #$fe
         lsr
-        clc
 </code>
 ===== ARR =====
@@ Line 1382: / Line 1485: @@
 </code>
-Depending on what you have in register A, you can express it in many differnet ways:
+Depending on what you have in register A, you can express it in many different ways:
 <code>
@@ Line 1403: / Line 1506: @@
           sbc num
           sta neg
+          ;num in a, carry set
+          lda num
+          sbc #$01
+          eor #$ff
 </code>
-There are of course also other expressions possible, just ponder a while about the term.
+There are of course also other expressions possible, just ponder a while about the term. Also the carry flag after the negation can be influenced, depending on using sbc or adc for most cases ($00/$ff will cause an overflow).
+How about forming terms with logical operations? We notice, that for e.g. (a + b) xor $ff is the same as (a xor $ff) - b:
+<code>
+          lda num1
+          clc
+          adc num2
+          eor #$ff
+          ;can also be written as
+          lda num1
+          eor #$ff
+          sec
+          sbc num2
+</code>
 ====== Running out of registers ======
@@ Line 1421: / Line 1544: @@
         tsx ;fetch value from table again
 </code>
+====== Limiting and masking ======
+Sometimes it occurs, that we want to extract the low nibble of a value and limit it to a given range.
+<code>
+        bpl .positive
+        cmp #$f0
+        bcs +
+        lda #$f0
++
+        and #$0f
+</code>
+As you can see, we limit the value to $f0 .. $ff first and then clamp of the highnibble to end up with values that range from $00..$0f
+Observe, how this can be done cheaper, by just shifting the range and making use of the wrap around of 8 bits/carry:
+<code>
+        bpl .positive
+        ;clc
+        adc #$10
+        bcs +
+        lda #$00
++
+</code>
+We add $10 so the limit is then reached, depending on the carry. As we now wrapped the 8 bits by overflowing, the upper bits are already zero and we can forgo on the and #$0f component. The lownibble is not affected, as we focus on the lower 4 bits only.
 ====== Misc stuff ======
@@ Line 1490: / Line 1641: @@
 <code>
-        lda bmp
+        lda bmp       ;could also use lax bmp, sbx #$08, stx bmp to save more cycles
         sec
         sbc #$08
@@ Line 1563: / Line 1714: @@
 **HAPPY OPTIMIZING!**
-Bitbreaker/Oxyron^Nuance
+Bitbreaker/Performers^Nuance