base:the_secret_of_fast_lzw_crunching
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | base:the_secret_of_fast_lzw_crunching [2015-04-17 04:34] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== The Secret of Fast LZW Crunching ====== | ||
+ | * By Antitrack/ | ||
+ | * Converted to ascii by Jazzcat/ | ||
+ | |||
+ | ===== Introduction ===== | ||
+ | |||
+ | Every commodore 64 - scener has most likely used these fiendish, evil, little utilities known as " | ||
+ | |||
+ | In this article, we will take an in-depth look at how an equal-sequence-cruncher works on the c64, and we will find the reason why he is so slow. We will find ways how to speed up things, and what means, hardware- or software-wise, | ||
+ | |||
+ | ===== Normal Operation ===== | ||
+ | |||
+ | In order to improve anything, we better got to get a deep understanding what's going on already. So, what, basically, does a LZW (euqal-sequence) cruncher do? The answer is, ofcourse, he scans for similar strings in a given file, and tries to eliminate them, so they appear only once. Let me try to draw an example: | ||
+ | |||
+ | An input file consists of the following bytes (all values in hexadecimal): | ||
+ | |||
+ | < | ||
+ | ADDRESS | ||
+ | $1000:::: | ||
+ | </ | ||
+ | |||
+ | Yes, this string is packable very well. A LZW cruncher will detect all similar strings, and will only store the largest of the similar string in the output file. Any time the string should appear in the decompressed memory, the LZW cruncher will store a some special codes instead of the string there. These special codes will tell the uncompressor where to look for the bytes , and how many of them. | ||
+ | In the example above, the result might look like this : | ||
+ | |||
+ | < | ||
+ | $00 $01 $03 $03 $0a $0d (<code for get " | ||
+ | (<again code for "get earlier mem"> | ||
+ | (<again code for "get earlier mem"> | ||
+ | </ | ||
+ | |||
+ | The decruncher needs to know atleast 3 things: | ||
+ | - shall I simply put these bytes into memory or shall i get a string from the memory earlier? | ||
+ | - If I get a string, how many bytes do I have to copy, and | ||
+ | - from what memory adress? | ||
+ | |||
+ | There are really a hell lot of methods to encode your information so that the decruncher will know -in an orderly manner- when to simply put data into memory, when to get a string from memory, and when to stop. For example, some compressors ALWAYS think they have to copy data from memory unless explicitly told otherwise! These compressors have " | ||
+ | |||
+ | However, we are not interested in the format of the headers and bytes that are encoded. We are interested in why the encoding process is so slow, notably on large files, and what we can do about it. | ||
+ | |||
+ | To understand this, let us return to the example above: | ||
+ | |||
+ | < | ||
+ | ADDRESS | ||
+ | $1000:::: | ||
+ | </ | ||
+ | |||
+ | What is the cruncher supposed to do? The cruncher has a pointer, say, $0002/$0003 (2 zeropage adresses) to our program at $1000. it looks in the whole memory from $1000 to $100c for a sequence. Here, it is reasonable to assume any sequence consists of at least 2 bytes, so the cruncher will look in $1000 to $100c for the bytes "$00 $01". He will not find them, so he will simply output $00 $01 without any special notice. | ||
+ | Next the cruncher will try to find the next 2 bytes , $01 $03 in the memory, and he will find them (at $1001, $1006) and proceed to crunch them. | ||
+ | Crunchers operation will continue at $1005 (at the $0d byte) , then at $1006 ($01 $03 again), then at $100b (another sequence). | ||
+ | |||
+ | So basically, on the c-64 , the cruncher is stuck up in a double loop. For each 2 byte sequence there is in the memory, the program will scan the whole other memory! This looks (and behaves) much like a basic loop in the following style: | ||
+ | |||
+ | < | ||
+ | For i=1 to 50000 do | ||
+ | for j=1 to 50000 do | ||
+ | .....(crunching algorithm) | ||
+ | next j | ||
+ | next i | ||
+ | </ | ||
+ | |||
+ | A double loop is, unfortunately, | ||
+ | |||
+ | How would such a code look like? | ||
+ | |||
+ | ===== A Simple Idea of Improvement ===== | ||
+ | |||
+ | We remember that the cruncher tends to look for a sequence that is atleast 2 bytes long. Well, then it would be cool to have some HEAVY extra memory and have all adresses of all 2-byte-values in some big arrays. Each array would have to be 64 kb big, ofcourse, since our 64kb input file may consist of only these 2 bytes appearing all over again. Let us, for a (strange) moment imagine that we have had a sort of super-c64 with infinite memory, and try to code some " | ||
+ | |||
+ | < | ||
+ | We have: all adresses of all $0000 values in an array going from $0000-0000 to $0000-$ffff | ||
+ | | ||
+ | | ||
+ | . | ||
+ | . | ||
+ | | ||
+ | </ | ||
+ | Ui, a rather big bunch of 65536 arrays, you will say. Yep. And how would our search routine look like? Answer: The search routine would look very simple, we imagine we have a super-c64 that can adress 32 bit adresses: | ||
+ | |||
+ | < | ||
+ | ldy # | ||
+ | loop lda $ea2d0000, | ||
+ | (akku is a 16 bit accu, miraculously) | ||
+ | beq end_of_crunch | ||
+ | jsr encode | ||
+ | iny | ||
+ | bne loop | ||
+ | end_of_crunch rts | ||
+ | </ | ||
+ | |||
+ | Wow, this would be great! Just some self-modifying code at the " | ||
+ | However, life is not that simple. We *can* code the example above with 8 bit registers (akku and y register, typically by using zeropage) but what's not so easy to get is a commodore-c64 with none less than 2^32 bits ram = 4194 megabytes of ram! | ||
+ | |||
+ | Obviously, we have to think of something less memory-eating. Here is how. | ||
+ | |||
+ | ===== Life is not so simple at all ===== | ||
+ | |||
+ | The basic idea is to use a chained list. A " | ||
+ | |||
+ | a) An easy-to-find start pointer to the chained list | ||
+ | b) the middle of the chained list, consisting of link and data (in whatever order you decide) | ||
+ | c) the end of the list | ||
+ | |||
+ | Case b) is unique : The middle of the chained list consists of two things. The " | ||
+ | |||
+ | Most readers are getting uneasy here, so let me sum up the monumentous task that lies ahead of us: | ||
+ | |||
+ | We are going to split the task of LZW crunching in TWO parts. Part one will be to construct an array of big chained lists, and for programming convenience reasons we are going to waste all extra 512 kb ram of external memory just for this "big array of chained lists" | ||
+ | Part two of our work is dedicated to acually make use of these mess of chained lists, which is rather simplistic anyway. For any sequence that lies ahead of us, we will look into the chained list of this specific sequence in order to determine where the next 2 bytes of this sequence are located inside the c64's memory. This is almost as simplistic as the imaginary 16-bit-supercode program that was mentioned above, so hold your breath, it's worth the trouble! But first lets write some code to actually use the big bunch of " | ||
+ | |||
+ | ===== A tour de Memory ===== | ||
+ | |||
+ | Let us try to draw a memory map: | ||
+ | |||
+ | The 512 kb REU is being split into 8 banks, bank 0 up to bank 7, so lets try to make use of that: | ||
+ | |||
+ | We will use: | ||
+ | |||
+ | Bank 0 to hold the low byte of all start pointers to all chained lists \\ | ||
+ | bank 1 to hold the hi byte of all start pointers to all chained lists \\ | ||
+ | bank 2 to hold the low byte of all END pointers to all chained lists \\ | ||
+ | bank 3 to hold the high byte of all END pointers to all chained lists \\ | ||
+ | |||
+ | bank 4-7 will hold all the chained lists. | ||
+ | |||
+ | "Wait a moment! Where is the middle byte of the pointers?" | ||
+ | |||
+ | Well, this is a valid question. Indeed if we have to have pointers to a big area we need more than 16 bits...DO WE? There is a little trick clause here: We assume that our small pieces of chains are at an even adress only, since they take 4 bytes for each piece of link. So, each piece of link will be at an adress like $xxx0 , $xxx4, $xxx8 or $xxxC. There is no need to store the least significant 2 bits, since they will always be zero! Nice trick, huh? This also means, we will have to shift 18 bit adresses to 16 bit adresses, and when we get them back, we will have to shift 2 zero-bits in again to get the 16 bit adress. Simple, but saving tons of memory! | ||
+ | |||
+ | Anyhow - how are we going to construct this list? | ||
+ | |||
+ | First, we are going to clear the whole 512 kb of extra RAM with zero. You will see that this task will not only clear everything but save us lots of work later. | ||
+ | |||
+ | What we, basically, need, is a single integer-variable, | ||
+ | |||
+ | Now, we will start with a pointer to the start of the data that is to be packed. We will get 2 bytes there. (Let us assume, for readability, | ||
+ | |||
+ | If, however, the start pointer is not $0000, it means the chunk already exists (somewhere in REU memory) so we have to follow a more tricky strategy of memory allocation: First, we will still put our chunk at the REU place the counter tells us, but then we will look at the end pointer (pages 2 and 3 of REU) for the selected linked list, and change the link info from $0000 (end of list) to $xxxx (whereever the counter told us the " | ||
+ | |||
+ | In all cases, at the end of this routine we have to correct the end-of-list pointer from the old value to the new value (our trusty counter tells us where). | ||
+ | This was tricky, wasnt it ? Lets look together at the source code that does it all. We will mainly assume our part 1 code has been properly inserted into any version of Darksqueezer, | ||
+ | |||
+ | ===== The source part 1 ===== | ||
+ | < | ||
+ | COUNTLO | ||
+ | COUNTHI | ||
+ | |||
+ | |||
+ | *= $1000 ; some nifty start adress | ||
+ | |||
+ | SEI | ||
+ | LDA #$2F | ||
+ | STA $00 | ||
+ | LDA #$37 | ||
+ | STA $01 | ||
+ | LDA #$80 ; reu page 4 mem $0000-$0200 is reserved | ||
+ | STA COUNTLO | ||
+ | LDA #$00 ; start before reu page 4 mem $0200 . | ||
+ | STA COUNTHI | ||
+ | LDA #$00 ; the main DSQ code didnt tell us yet the proper | ||
+ | | ||
+ | SBC $AE ; here | ||
+ | STA $AE | ||
+ | STA ENDE+1 | ||
+ | LDA #$00 ; in order to remember $ae/$af data start/end | ||
+ | SBC $AF | ||
+ | STA $AF | ||
+ | STA ENDE+5 | ||
+ | |||
+ | LDA #$00 ; start adress in c64: $0200 (temp variable) | ||
+ | STA $DF02 | ||
+ | LDA #$02 | ||
+ | STA $DF03 | ||
+ | LDA #$01 ; one byte to transfer | ||
+ | STA $DF07 | ||
+ | LDA #$00 ; one byte to transfer | ||
+ | STA $DF08 | ||
+ | LDY #$00 | ||
+ | STY $01 ; $01 = 00 : RAM only on | ||
+ | |||
+ | LDA ($AE), | ||
+ | STA $03 ; into mini ringpuffer at $02/$03 | ||
+ | |||
+ | MAININS | ||
+ | STA $02 | ||
+ | LDY #$00 | ||
+ | STY $01 | ||
+ | INY | ||
+ | LDA ($AE), | ||
+ | STA $03 ; into ringpuffer | ||
+ | LDA #$37 ; enable ram expansion | ||
+ | STA $01 | ||
+ | STA $D020 ; do some stupid $d020-flicker | ||
+ | |||
+ | LDY #$00 ; get start pointer $00xxyy and $01xxyy | ||
+ | STY $D020 ; from REU page 0 and 1 | ||
+ | LDA $02 | ||
+ | LDX $03 | ||
+ | JSR SET456 | ||
+ | |||
+ | JSR READ ; | ||
+ | BNE COMPLINS | ||
+ | INC $DF06 ; get page 1 reu pointer hibyte for this chunk | ||
+ | JSR READ | ||
+ | BEQ SIMPLINS ; hibyte AND lowbyte are zero, this means | ||
+ | ; the link doesnt even exist | ||
+ | ; so choose simple strategy | ||
+ | |||
+ | |||
+ | COMPLINS LDA #$02 ; this is the complex strategy | ||
+ | STA $DF06 ; get existing upper end of linked list | ||
+ | JSR READ ; from $02-xxxx (bank 2 reu) | ||
+ | STA $0100 ; store at $0100 | ||
+ | INC $DF06 ; get hibyte of upper end bank 3 reu | ||
+ | JSR READ ; | ||
+ | STA $0101 ; store at $0101 | ||
+ | JSR MULT4 ; calc 16 to 18 bit, 2 least sig. bits = 0 | ||
+ | |||
+ | INC $DF04 ; add 2 to get to the link | ||
+ | INC $DF04 | ||
+ | LDA COUNTLO | ||
+ | JSR WRITE ; to end of linked list | ||
+ | INC $DF04 | ||
+ | LDA COUNTHI | ||
+ | JSR WRITE | ||
+ | JMP COMPLCON ; continue with complex insert | ||
+ | |||
+ | |||
+ | SIMPLINS | ||
+ | LDX $03 ; write current link adress to $00-xxxx | ||
+ | LDY #$00 | ||
+ | JSR SET456 | ||
+ | LDA COUNTLO | ||
+ | JSR WRITE ; and hibyte to $01-xxxx | ||
+ | INC $DF06 | ||
+ | LDA COUNTHI | ||
+ | JSR WRITE | ||
+ | |||
+ | ; now that all pointer work was done, we do the real work of storing the chunk : | ||
+ | this is, storing the value and storing the link to the next chunk, and, | ||
+ | remembering the proper end of list. | ||
+ | |||
+ | COMPLCON LDA $02 ; we will remember the end of list at $02-xxxx | ||
+ | LDX $03 | ||
+ | LDY #$02 | ||
+ | JSR SET456 | ||
+ | LDA COUNTLO | ||
+ | JSR WRITE ; storing at $02-xxxx | ||
+ | INC $DF06 ; end of list hibyte pointer at $03-xxxx | ||
+ | LDA COUNTHI | ||
+ | JSR WRITE ; storing at $03-xxxx | ||
+ | |||
+ | ; now we have stored all sort of links only, | ||
+ | ; its a good time to store the actual chunk. Thus we convert the 16 bit counter to 18 bit again, | ||
+ | ; then we store the chunk at the 18 bit address. | ||
+ | |||
+ | LDA COUNTLO ; store countlo/hi in $0100 as a temp variable | ||
+ | STA $0100 | ||
+ | LDA COUNTHI | ||
+ | STA $0101 | ||
+ | JSR MULT4 ; shift 16 to 18 bits, 2 0-bits inserted at bit 0 and 1 | ||
+ | LDA $AE ; write data to chunk adress c64 mem lobyte | ||
+ | JSR WRITE | ||
+ | INC $DF04 ; write data to chunk: adress c64 mem hibyte | ||
+ | LDA $AF | ||
+ | JSR WRITE | ||
+ | |||
+ | ; now we would actually have to write $0000 to the new chunk at adress COUNTLO/HI in REU. | ||
+ | However we dont need to do this coz the whole REU was filled , earlier, with $00-bytes. | ||
+ | |||
+ | ; INC $DF04 | ||
+ | ; LDA #$00 ; write "end of list" link value $0000 (not necessary). | ||
+ | ; JSR WRITE | ||
+ | ; INC $DF04 | ||
+ | ; LDA #$00 | ||
+ | ; JSR WRITE | ||
+ | |||
+ | |||
+ | INC COUNTLO ; since we have stored our chunk at address " | ||
+ | BNE IN1 ; in REU, we have to increment COUNT here. | ||
+ | INC COUNTHI | ||
+ | IN1 INC $AE ; we also increment to the next adress inside the c-64 mem. | ||
+ | BNE *+4 | ||
+ | INC $AF | ||
+ | LDA $AF | ||
+ | AND $AE | ||
+ | CMP #$FF | ||
+ | BEQ ENDE ; until end of memory | ||
+ | |||
+ | JMP MAININS ; end of mem not reached : goto main loop. | ||
+ | |||
+ | ENDE LDA #$00 ; end of memory reached: Restore some important | ||
+ | STA $AE ; memory zp locations ($00ae/af ) using self modifying code | ||
+ | LDA #$00 ; attention Self modifying code | ||
+ | STA $AF | ||
+ | LDY #$00 | ||
+ | STY $01 | ||
+ | LDA ($AE),Y ; set an internal flag ($87) for DSQ | ||
+ | EOR #$FF | ||
+ | STA $87 | ||
+ | DEC $01 | ||
+ | RTS | ||
+ | |||
+ | |||
+ | MULT4 LDA #$00 ; mult4: convert 16 bits at $0100/$0101 to 18 bits | ||
+ | STA $0102 ; shift in 2 zero bits at bit 0 and bit 1. | ||
+ | ASL $0100 | ||
+ | ROL $0101 | ||
+ | ROL $0102 | ||
+ | CLC | ||
+ | ASL $0100 | ||
+ | ROL $0101 | ||
+ | ROL $0102 | ||
+ | LDA $0102 | ||
+ | CLC | ||
+ | ADC #$04 ; add 4: select bank 4 as starting bank | ||
+ | STA $DF06 ; and store $0100/1/2 into REU adress select registers: | ||
+ | LDA $0101 ; $df04/ | ||
+ | STA $DF05 | ||
+ | LDA $0100 | ||
+ | STA $DF04 | ||
+ | RTS | ||
+ | |||
+ | SET456 | ||
+ | STX $DF05 | ||
+ | STY $DF06 | ||
+ | RTS | ||
+ | WRITE STA $0200 ; write a byte to reu: store byte at $0200 then write it to | ||
+ | LDA #$FC ; current REU adress | ||
+ | STA $DF01 | ||
+ | RTS | ||
+ | READ LDA #$FD ; read a byte from reu: store byte from reu at $0200 | ||
+ | STA $DF01 ; then get byte into accu. | ||
+ | LDA $0200 | ||
+ | RTS | ||
+ | |||
+ | -------------------- (insert coffee break here) ---------------------- | ||
+ | </ | ||
+ | If you have done all your homework and carefully followed the mazelike array of linked lists, you might have been somewhat uncomfortable. No, not due to the fact that you might be lacking brains to follow all these links, but that so much memory has been wasted. After all, do we really need these none less than 128 kb memory.....just to find the end of the linklist, stored in bank 2 and bank3 ? | ||
+ | |||
+ | Answer: No, we did not really need these extra 128 kb. Instead of having a huge array of end pointers, we might follow the list up to its end everytime we decide to chain another chunk to the end of the list. However this would mean we had to scan the entire list each time we decide to insert another chunk. This is eating valuable crunching time, thus is rather unacceptable. | ||
+ | |||
+ | Phew! We are nearly set! Now let us take a close look at what DSQ really does 99.5% all of the crunching time, and lets write the code to improve that performance by using the carefully constructed maze of lists above. Much to our pleasant surprise, we will find out it's rather simple to code and short too! | ||
+ | |||
+ | But first let us disassemble what DSQ does 99% all of the time. Suck and see: If you crunch a large file using DSQ 2.0 or 2.2 and if you press your freezer cartridge button, you will most likely end up inside a little code fragment that's inside the zeropage. Let's disasm that one (and yes, ofcourse it's some clever self modifying code....): | ||
+ | |||
+ | ===== More sources ===== | ||
+ | < | ||
+ | ; akku contains the 8 bit low byte of the sequence | ||
+ | ; we actively search for the sequence by finding the first 8 bits, first .... | ||
+ | |||
+ | 0030 D9 00 EE CMP EE00, | ||
+ | 0033 F0 0C BEQ 0041 ; similar? good....lets do work | ||
+ | 0035 C8 INY ; no, scan the rest of this page | ||
+ | 0036 D0 F8 BNE 0030 | ||
+ | 0038 E6 32 | ||
+ | 003A F0 04 BEQ 0040 | ||
+ | 003C C6 2F | ||
+ | 003E D0 F0 BNE 0030 | ||
+ | 0040 60 RTS | ||
+ | ---------------------------------- | ||
+ | 0041 AA TAX ; store accu in x for later use | ||
+ | 0042 84 31 STY 31 ; self modifying, clever code | ||
+ | 0044 A0 01 LDY #01 ; try to determine sequence length | ||
+ | 0046 B9 2E EC LDA EC2E, | ||
+ | 0049 D1 31 | ||
+ | 004B D0 05 BNE 0052 | ||
+ | 004D C8 INY | ||
+ | 004E C0 FF CPY #FF | ||
+ | 0050 D0 F4 BNE 0046 | ||
+ | 0052 88 DEY | ||
+ | 0053 C4 02 | ||
+ | 0055 B0 09 BCS 0060 ; bigger? Yes: do real crunching | ||
+ | 0057 8A TXA | ||
+ | 0058 A4 31 | ||
+ | 005A A2 00 LDX #00 | ||
+ | 005C 86 31 | ||
+ | 005E F0 D5 BEQ 0035 ; shit its smaller than profitable, keep scanning | ||
+ | 0060 4C F8 11 JMP 11F8 ; do real crunching | ||
+ | ---------------------------------- | ||
+ | </ | ||
+ | Ok, so here is the summary: This heavy self-modifying beast searches for the next sequence by scanning the whole memory for suitable (equal) strings. $02 seems to hold the current "at least wanted" | ||
+ | |||
+ | What can we do about it? | ||
+ | |||
+ | Well.... $0030 and $31 hold some important pointer, so it would be unwise to modify anything here. The BEQ command at $0033 actually could serve us to do something useful. That is to say, if a sequence seems to be immidately ahead, this very BEQ will be executed. So we will leave it alone aswell. But, at $0035 the INY is part of the dreaded slooow loop that scans the memory. We will mercilessly place a JMP at $0035 to our own code: | ||
+ | |||
+ | < | ||
+ | $0035 JMP $0338 | ||
+ | </ | ||
+ | |||
+ | As you will soon see, we will have enuff memory in the .... TAPE BUFFER ... to do all the scanning that will replace the loop at $0035 up to $003e. Yes, our routine will be longer, but not that considerably longer! | ||
+ | |||
+ | Let's sum up again what our routine has to do: | ||
+ | |||
+ | Our routine has to scan for a sequence, whose LSB we know, (in the accu) and whose hibyte is in $0031. The sequence adress of the same sequence must be bigger or equal than the number we have identified to reside in memory location $0047/48. 47/48 also point to a sequence with the same two bytes as in 0030/0031. We will use pointer 47/48 in our case only. | ||
+ | |||
+ | We will add some special flags that will make our life easier if multiple instances of the same sequence are being needed each " | ||
+ | |||
+ | Simple enuff? Yes it is. Lets look at the code now ! | ||
+ | |||
+ | ===== The source part 2 ===== | ||
+ | < | ||
+ | ACCU = $80 ; zeropage wasting code :) | ||
+ | XREG = $81 | ||
+ | YREG = $82 | ||
+ | H1 = $83 | ||
+ | H2 = $84 | ||
+ | BYTE1 = $85 | ||
+ | BYTE2 = $86 | ||
+ | LASTBYTE1 = $87 | ||
+ | LASTBYTE2 = $88 | ||
+ | GOT1 = $89 | ||
+ | GOT2 = $8A | ||
+ | OLDDF04 | ||
+ | OLDDF05 | ||
+ | OLDDF06 | ||
+ | |||
+ | |||
+ | *= $0338 ; our jmp at $0035 gets us here | ||
+ | |||
+ | MAINPRG | ||
+ | STX XREG | ||
+ | |||
+ | NOFF LDY #$00 | ||
+ | |||
+ | LDA ($47), | ||
+ | STA H1 ; | ||
+ | INY | ||
+ | LDA ($47),Y | ||
+ | STA H2 | ||
+ | |||
+ | LDY #$FF ; switch to normal ram and enable REU registers | ||
+ | STY $01 | ||
+ | |||
+ | LDX BYTE1 ; flag if we have already scanned for this sequence | ||
+ | BEQ NEXTFETCH | ||
+ | |||
+ | CMP LASTBYTE2 | ||
+ | BNE NEXTFETCH | ||
+ | LDA H1 | ||
+ | CMP LASTBYTE1 | ||
+ | BEQ NOCHMAL | ||
+ | |||
+ | NEXTFETCH | ||
+ | LDA H1 ; no shortcut, sniff .... | ||
+ | STA $DF04 ; thus, let us scan the start of the list | ||
+ | LDA H2 ; get start pointer of list into " | ||
+ | STA $DF05 | ||
+ | LDA #$00 ; bank 0 contains lobyte of pointer | ||
+ | STA $DF06 | ||
+ | JSR READ ; get it into variable " | ||
+ | LDA GOT1 ; LOWBYTE | ||
+ | PHA ; notice it (TAX might work also) | ||
+ | INC $DF06 ; get bank 1 hibyte of pointer | ||
+ | JSR READ | ||
+ | LDA GOT1 | ||
+ | STA OLDDF05 | ||
+ | PLA | ||
+ | STA OLDDF04 | ||
+ | JMP CONT | ||
+ | |||
+ | ; if we should get the same link again we didnt need to follow the procedure above. | ||
+ | However we have to do a sanity check first if we are already at the end of the | ||
+ | list......... | ||
+ | |||
+ | NOCHMAL | ||
+ | ORA OLDDF05 | ||
+ | BEQ GIBTSNICHT ; if link = 0000 then end of list, unsucessful search for | ||
+ | ; a sequence. | ||
+ | |||
+ | ; in any given case, we have to convert 16 bit to 18 bit adress to follow the link..... | ||
+ | that's what the ROLROUT subroutine does. Moreover the same subroutine also reads the | ||
+ | bytes into the GOT1/GOT2 zeropage variable. | ||
+ | |||
+ | CONT | ||
+ | |||
+ | GOODMOV | ||
+ | CMP $48 ; really bigger than the adress found in $0047/48. | ||
+ | BEQ TEST2 | ||
+ | BCS NICHTNOCHMAL | ||
+ | BCC NOCHMAL | ||
+ | |||
+ | TEST2 LDA GOT1 | ||
+ | CMP $47 ; adress equal: check again | ||
+ | BEQ NOCHMAL | ||
+ | BCC NOCHMAL | ||
+ | |||
+ | ; the found adress of the sequence is bigger than the adress of 47/48. This means we | ||
+ | found a fine 2 byte adress in the sequence, and can proceed to try to crunch. For this | ||
+ | reason we will simply jump back to the zeropage-code responsible: | ||
+ | |||
+ | NICHTNOCHMAL | ||
+ | |||
+ | LDA GOT2 ; store " | ||
+ | STA $32 | ||
+ | |||
+ | LDA H1 | ||
+ | STA LASTBYTE1 | ||
+ | LDA H2 ; to look for her again next time! | ||
+ | STA LASTBYTE2 | ||
+ | |||
+ | LDX #$00 ; correct $01 | ||
+ | STX $01 | ||
+ | INX | ||
+ | STX BYTE1 ; set flag " | ||
+ | LDY GOT1 ; restore accu and set y properly | ||
+ | LDA ACCU | ||
+ | JMP !$41 ; jump back to original zeropage code | ||
+ | |||
+ | ; this is the code if the sequence was not found at all (end of linked list reached). | ||
+ | |||
+ | GIBTSNICHT ; | ||
+ | LDA #$00 | ||
+ | STA BYTE1 ; set flag " | ||
+ | TAY ; correct registers and quit | ||
+ | STA $01 | ||
+ | RTS | ||
+ | |||
+ | |||
+ | ROLROUT | ||
+ | STA OLDDF06 | ||
+ | ASL OLDDF04 | ||
+ | ROL OLDDF05 | ||
+ | ROL OLDDF06 | ||
+ | ASL OLDDF04 | ||
+ | ROL OLDDF05 | ||
+ | ROL OLDDF06 | ||
+ | LDA OLDDF06 | ||
+ | CLC | ||
+ | ADC #$04 | ||
+ | STA OLDDF06 | ||
+ | STA $DF06 | ||
+ | LDA OLDDF05 | ||
+ | STA $DF05 | ||
+ | LDA OLDDF04 | ||
+ | STA $DF04 | ||
+ | READ | ||
+ | STA $DF01 | ||
+ | RTS | ||
+ | </ | ||
+ | |||
+ | ===== Summary and final analysis ===== | ||
+ | |||
+ | Uh, this was quite some code. Can you sum up yourself what it does? | ||
+ | |||
+ | It looks for a 2 byte sequence similar to the one at $0047/8. If that is the same 2 byte sequence that the one we have been looking for earlier (a very common case!) then we will simply re-use the old registers with the proper set values to follow the link. If not, we scan the chained list fromt the start. In any case, we determine if the c64 adress from the chained list's DATA is smaller than the current $47/48 adress. If so, we proceed to scan the list either until the end of the list or a proper adress was found. If a proper adress was found, we store its value into $30/$31 and jump back to the crunchers routines. Otherwise, the end of the chained list will be reached soon, and the program will exit with code " | ||
+ | Instead of scanning the whole memory, we now scan a rather tiny list (tiny: compared to the whole memory that is). If you press reset in the middle of crunching and if you decide to look at (and follow) these chained lists in REU memory yourself, you will find out they are rather small, compared to a whole 60kb of memory to scan! | ||
+ | |||
+ | So actually, what is the worst case? The worst case would be e.g. a input data file consisting of ONLY 0 - bytes. There would be only ONE sequence in this whole file, and only ONE single chained list would be produced by the program from pass one. These are those rare cases where normal Darksqueezer also tends to go haywire (i.e, very slow). No surprise you are advised to crunch your data first with an equal-character-cruncher to eliminate such cases. | ||
+ | |||
+ | Let us rather look at a typical case. Let s look at a game like " | ||
+ | |||
+ | If , on the other hand, we have input data which is VERY WELL crunchable (i.e. "Dizzy down the Rapids" | ||
+ | |||
+ | But ofcourse, I still hear some people moan that they have only a 256 kb ram expansion (and, arguably, still some more time left to waste for crunching). Lets see what we can do for these folks....but not tonight! I will leave this for another, and hopefully kewler, article! | ||
+ | |||
+ | Yours truly, | ||
+ | |||
+ | Antitrack/ | ||
base/the_secret_of_fast_lzw_crunching.txt · Last modified: 2015-04-17 04:34 by 127.0.0.1