magazines:chacking17
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | magazines:chacking17 [2015-04-17 04:34] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | < | ||
+ | ######## | ||
+ | ################## | ||
+ | ###### | ||
+ | ##### | ||
+ | ##### #### #### ## ##### #### | ||
+ | ##### ## ## #### ## ## | ||
+ | ##### | ||
+ | ##### ## ## ######## | ||
+ | ##### #### #### #### #### ##### #### | ||
+ | ##### ## | ||
+ | ###### | ||
+ | ################## | ||
+ | ######## | ||
+ | |||
+ | ............................................................................... | ||
+ | |||
+ | "The words of the wise are as goads, and as nails fastened by | ||
+ | the masters of assemblies." | ||
+ | |||
+ | " | ||
+ | way, if he gets mad he's a mile away and barefoot." | ||
+ | |||
+ | ............................................................................... | ||
+ | |||
+ | BSOUT | ||
+ | |||
+ | For me, fall is a time for reflection. | ||
+ | golden time and then seem to die. And yet, under the surface they are quite | ||
+ | alive, and teeming with activity at a smaller, less-visible scale, waiting to | ||
+ | burst forwards again in full bloom. | ||
+ | in this. But I have no idea what it is. | ||
+ | In fact things are totally hectic around here, and I haven' | ||
+ | more than a few moments thought towards the 64, so this will be a mighty | ||
+ | short editorial. | ||
+ | much 64-time, but with a little here and a little there this issue is finally | ||
+ | squeaking out. Everybody worked hard over the summer, and my goal was | ||
+ | to get it out in September. | ||
+ | the 64 world is only two months late it's doing pretty good, so no big whoop. | ||
+ | C=Hacking ought to appear reguarly after December, though. | ||
+ | On the down side, some things, such as the challenge problem from | ||
+ | last time, will have to wait until the next issue. | ||
+ | low-key while putting this issue together, but future issues will be more | ||
+ | public in soliciting articles (e.g. on comp.sys.cbm). | ||
+ | In other news, a C64 C-compiler has finally appeared! | ||
+ | effort is the work of Ullrich von Bassewitz, uz@musoftware.de, | ||
+ | behind Elite128 among other things. | ||
+ | |||
+ | http:// | ||
+ | |||
+ | so have a look, and tell Ullrich what a stellar guy he is :). Also, as | ||
+ | most of you know, the Chicago Expo took place on October 24, and was a real | ||
+ | hoot! Check out | ||
+ | |||
+ | http:// | ||
+ | |||
+ | for some nice pictures from the Expo, taken by Mark Seelye. | ||
+ | |||
+ | Meanwhile, sit back, relax, and enjoy these latest musings from | ||
+ | these, our masters of assembly. | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | ::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | BSOUT | ||
+ | o Voluminous ruminations from your unfettered editor. | ||
+ | |||
+ | |||
+ | Jiffies | ||
+ | o Is it a bug or a feature? | ||
+ | |||
+ | |||
+ | The C=Hallenge | ||
+ | |||
+ | o To be continued... | ||
+ | |||
+ | |||
+ | Side Hacking | ||
+ | |||
+ | o " | ||
+ | An amateur' | ||
+ | |||
+ | |||
+ | Main Articles | ||
+ | |||
+ | o "An Optimizing Hybrid LZ77 RLE Data Compression Program, aka | ||
+ | Improving Compression Ratio for Low-Resource Decompression", | ||
+ | by Pasi ' | ||
+ | |||
+ | Part two of a two-part article on data compression, | ||
+ | detailed description of the compression algorithms used in | ||
+ | pucrunch, not to mention the decompression code. | ||
+ | |||
+ | o " | ||
+ | < | ||
+ | |||
+ | This is the first in a series of articles which aims to | ||
+ | present a complete, commented disassembly of the VIC-20 | ||
+ | ROMs. | ||
+ | |||
+ | o Masters Class: " | ||
+ | < | ||
+ | |||
+ | Sit up straight and pay attention. | ||
+ | Commodore luminary attempts to instruct a couple of ignorant | ||
+ | plebians in his art. In this case, Robin and I set out to | ||
+ | learn NTSC/PAL fixing from one of the greats, Decomp/ | ||
+ | Our first fix, a demo from the obscure Finnish group Pu-239, | ||
+ | is included, along with detailed descriptions of our experiences. | ||
+ | |||
+ | o "The Herd Mentality", | ||
+ | |||
+ | This is a collection of entertaining musings on Commodore and the | ||
+ | development of the C128, as provided by Bil Herd (and that's no | ||
+ | bull). | ||
+ | SYS 32800, | ||
+ | |||
+ | | ||
+ | .................................. Credits ................................... | ||
+ | |||
+ | Editor, The Big Kahuna, The Car' | ||
+ | C=Hacking logo by.......................... Mark Lawrence | ||
+ | |||
+ | Special thanks to Marko Makela, Olaf Seibert, and the rest of the cbm-hackers | ||
+ | for their many otherwise unacknowledged contributions. | ||
+ | |||
+ | Legal disclaimer: | ||
+ | 1) If you screw it up it's your own damn fault! | ||
+ | 2) If you use someone' | ||
+ | |||
+ | For information on the mailing list, ftp and web sites, send some email | ||
+ | to chacking-info@jbrain.com. | ||
+ | |||
+ | About the authors: | ||
+ | |||
+ | Pasi ' | ||
+ | working at a VLSI design company on a RISC DSP core C compiler. | ||
+ | Around 1984 a friend introduced him to the VIC-20, and a couple | ||
+ | of years later he bought a 64+1541 to replace a broken Spectrum48K. | ||
+ | He began writing his own BBS, using ML routines for speed, and | ||
+ | later wrote a series of demos under the Pu-239 label. | ||
+ | to pucrunch and his many C=Hacking articles, Pasi was written an Amiga | ||
+ | 1581 filesystem, a graphics conversion package, a C64 burstloader, | ||
+ | number of demos. | ||
+ | is contemplating multipart demos for the 64 and the VIC-20, in addition | ||
+ | to future C=Hacking articles. | ||
+ | has a B5 quote page at http:// | ||
+ | |||
+ | Richard Cini is a 31 year old senior loan underwriter who first became | ||
+ | involved with Commodore 8-bits in 1981, when his parents bought him a | ||
+ | VIC-20 as a birthday present. | ||
+ | programming, | ||
+ | the lawn sprinkler system, and for a text-to-speech synthesyzer. | ||
+ | his CBM stuff is packed up right now, along with his other " | ||
+ | computers, including a PDP11/34 and a KIM-1. | ||
+ | old computers Richard enjoys gardening, golf, and recently has gotten | ||
+ | interested in robotics. | ||
+ | is unique in being fiercely loyal without being evangelical, | ||
+ | some other communities, | ||
+ | best use out of the 64. | ||
+ | |||
+ | Robin Harbron is a 26 year old internet tech support at a local | ||
+ | independent phone company. | ||
+ | in 1980, playing with school PETs, and in 1983 his parents convinced | ||
+ | him to spend the extra money on a C64 instead of getting a VIC-20. | ||
+ | Like most of us he played a lot of games, typed in games out of | ||
+ | magazines, and tried to write his own games. | ||
+ | dabbles with Internet stuff, writes C= magazine articles, and, yes, | ||
+ | plays games. | ||
+ | as well as the " | ||
+ | until-inspiration-hits-again Internet stuff" | ||
+ | raising a family, and enjoys music (particularly playing bass and guitar), | ||
+ | church, ice hockey and cricket, and classic video games. | ||
+ | |||
+ | |||
+ | ................................... Jiffies .................................. | ||
+ | |||
+ | |||
+ | 0 REM SHIFT-L | ||
+ | by the cbm.hackers | ||
+ | |||
+ | Everybody knows the old REM shift-L trick in BASIC 2.0, which | ||
+ | generates a syntax error upon listing. | ||
+ | answer turns out to be quite interesting. | ||
+ | Normally, when the BASIC interpreter tokenizes a line it strips | ||
+ | out characters with the high bit set. One exception is characters within | ||
+ | quotes; the other is characters after REM. In those cases, characters | ||
+ | are embedded literally into the program line. | ||
+ | Now, BASIC tokens all have the high bit set. When LIST | ||
+ | encounters a character with the high bit set, it checks whether | ||
+ | it is in quote mode. If it is, the character is outputted as normal. | ||
+ | If not, the character is treated as a token, which is expanded by | ||
+ | QPLOP (located at $A717). | ||
+ | like this: | ||
+ | |||
+ | : | ||
+ | BEQ :PLOOP | ||
+ | : | ||
+ | LDA RESLST,Y | ||
+ | BPL :LOOP2 | ||
+ | BMI :LOOP1 | ||
+ | : | ||
+ | LDA RESLST,Y | ||
+ | BMI LISTENT1 | ||
+ | JSR $AB47 ;Print char, AND #$FF | ||
+ | BNE :PLOOP | ||
+ | |||
+ | The keyword strings in RESLST all stored dextral character inverted | ||
+ | (the last char has the high bit set), and the above code just moves | ||
+ | forward through the table until it has counted out .X keywords. | ||
+ | that point, :PLOOP prints out the next keyword to the screen, and | ||
+ | hops back into LIST. | ||
+ | |||
+ | Shift-L is character code 204, or $CC. When LIST sees this | ||
+ | inside of a REM, it sees the high bit set and de-tokenizes it. | ||
+ | As it turns out, though, the last token is $CB, which is keyword GO | ||
+ | (so that GO TO works). | ||
+ | BASIC keywords, uses 255 characters. | ||
+ | (value zero, not character zero). | ||
+ | So, the above code goes through the list, and then prints | ||
+ | out token $CC, the first character of which is a null. This zero | ||
+ | is sent to $AB47. | ||
+ | with the character), performs an AND #$FF, and exits. | ||
+ | the BNE :PLOOP branch _not_ get taken, and the code erroneously moves | ||
+ | forwards into... | ||
+ | And the first thing FOR does is look for a valid variable. | ||
+ | When you LIST a program, the character immediately following the | ||
+ | LIST is a statement terminator (a colon : or else an end of line null). | ||
+ | LET (which is called by FOR) reads this character, decides it's an | ||
+ | invalid variable name, and generates a ? | ||
+ | Because QPLOP uses absolute addressing (LDA RESLST,Y), .Y | ||
+ | can wrap around through 255 to traverse the list again. | ||
+ | why shift-M shift-N etc. all generate valid keywords. | ||
+ | will force an error, and it is all due to the zero in the keyword | ||
+ | table. | ||
+ | Similar things can happen in other BASICs. | ||
+ | token $DB does the trick. | ||
+ | The problem seems to have been fixed in BASIC 7.0; at least the | ||
+ | trick doesn' | ||
+ | |||
+ | Finally, like most things on the 64, embedding tokens into | ||
+ | REM statements can naturally be put to some creative use. For example, | ||
+ | RUN once ran a contest where readers submitted stories and riddles | ||
+ | and such, which were read by LISTing the program. | ||
+ | and entertaining, | ||
+ | since high school: | ||
+ | |||
+ | 10 REM WHAT'S AN APPLECOSTA? | ||
+ | 20 REM {C=-V}T A {INST CTRL-0}E | ||
+ | |||
+ | |||
+ | ............................... The C=Hallenge ............................... | ||
+ | |||
+ | Wait until next time! | ||
+ | |||
+ | |||
+ | |||
+ | ................................ Side Hacking ................................ | ||
+ | |||
+ | SuperCPU software repair | ||
+ | ---------------------------> | ||
+ | |||
+ | One of the feature articles in this issue deals with NTSC/PAL | ||
+ | fixing. | ||
+ | it goes: you have that program that could really benefit from | ||
+ | the speed boost, but doesn' | ||
+ | little thing. | ||
+ | Well, it really bugs me to have programs not like my nice | ||
+ | hardware for dumb reasons, so I decided I would try my hand at fixing | ||
+ | up some programs. | ||
+ | "Stunt Car Racer" -- I had never played it before, but after getting | ||
+ | ahold of it it was clear that here was a game that would be just great | ||
+ | with a SuperCPU. | ||
+ | seemed a doable problem and so I jumped in head first, and this article | ||
+ | sums up my inexpert experience to date. | ||
+ | |||
+ | By the way, stuntcar-scpu is totally cool :). | ||
+ | |||
+ | To date I have fixed up just three games: Stunt Car Racer, Rescue | ||
+ | on Fractalus, and Stellar 7. My goal was really to " | ||
+ | to make them run off of my FD-2000 as well as my SCPU. Although these are | ||
+ | all games, the techniques should apply equally well to application programs | ||
+ | with a bad attitude. | ||
+ | to discuss a few generalities. | ||
+ | |||
+ | I also note that programmers who don't have a SuperCPU might find | ||
+ | some of this information helpful in designing their programs to work with | ||
+ | SCPUs. | ||
+ | |||
+ | Finally, my fixes are available in the Fridge. | ||
+ | |||
+ | Tools and Process | ||
+ | ----------------- | ||
+ | |||
+ | The tools I used were: | ||
+ | |||
+ | o Action Replay | ||
+ | o Wits | ||
+ | o Paper for taking notes (backs of receipts/ | ||
+ | |||
+ | I think this is all that is necessary, although a good sector editor | ||
+ | can come in handy for certain things. | ||
+ | |||
+ | After trying a number of different approaches to the problem, the process | ||
+ | I've settled on goes roughly like the following: | ||
+ | |||
+ | - Have an idea of what will need fixing | ||
+ | - Familiarize yourself with the program | ||
+ | - Track down the things that need fixing | ||
+ | - Figure out free areas of memory | ||
+ | - Apply patches, and test | ||
+ | |||
+ | Most programs work in more or less the same way: there are | ||
+ | some initialization routines, there' | ||
+ | interrupt routine or series of routines. | ||
+ | find, via the vectors at either $FFFA or at $0314 and friends. | ||
+ | initialization routine can be tougher, but can be deduced from | ||
+ | the loader or decompressor; | ||
+ | the reset code, so that RESTORE restarts the program. | ||
+ | things that need fixing usually involves freezing the program at the | ||
+ | appropriate time, and doing a little disassembly. | ||
+ | things like LDA $DC01 is helpful, too. Figuring out free areas of | ||
+ | memory is easy, by either getting a good feel for the program, or | ||
+ | filling some target memory with a fill byte and then checking it | ||
+ | later, to see if it was overwritten. | ||
+ | all that remains is to test it on the SCPU, and it's all done! | ||
+ | |||
+ | Diagnosis | ||
+ | --------- | ||
+ | |||
+ | It seems to me that, at the fundamental level, the SCPU is different | ||
+ | from a stock machine in three basic ways: it is a 65816, it runs at 20MHz, | ||
+ | and it has hardware registers/ | ||
+ | some strange and mysterious problems that can arise. | ||
+ | |||
+ | All possible opcodes are defined on the 65816, which means that | ||
+ | " | ||
+ | quasi-opcodes aren't real opcodes -- they are like little holes in the cpu, | ||
+ | and things going through those holes fall through different parts of the | ||
+ | normal opcode circuitry. | ||
+ | copy protection schemes make use of them, so sometimes the program works fine | ||
+ | with a SCPU but the copy protection makes it choke -- how very annoying | ||
+ | (example: EA's Lords of Conquest). | ||
+ | mean it won't work on an FD-2000, either. | ||
+ | |||
+ | Running at 20Mhz makes all sorts of problems. | ||
+ | loop will run too fast -- delay loops, countdown loops, input busy-loops, | ||
+ | etc. Also main program loops, so that the game runs unplayably fast | ||
+ | (most 3D games never had to worry about being too fast). | ||
+ | lead to flickering screens, as we shall see later, and the " | ||
+ | games is designed with 1Mhz in mind -- velocities, accelerations, | ||
+ | What looks smooth at the low frame rate might look poor at the high, as we | ||
+ | shall also see later. | ||
+ | like any other code using software-based timing. | ||
+ | |||
+ | The SuperCPU also has a series of configuration registers located | ||
+ | at $D07x and $D0Bx, which determine things like software speed and VIC | ||
+ | optimization modes (which areas of memory are mirrored/ | ||
+ | RAM). Note also that enabling hardware registers rearranges $E000 ROM | ||
+ | routines. | ||
+ | the SCPU, it is awfully unlikely, since the enable register, which switches | ||
+ | the hardware registers in, is sandwiched between disable registers: | ||
+ | |||
+ | $D07D Hardware register disable | ||
+ | $D07E Enable hardware registers | ||
+ | $D07F Hardware register disable | ||
+ | |||
+ | Strangely enough, though, different hardware configurations can sometimes | ||
+ | cause problems. | ||
+ | the stack and zero page, and by default have that mirroring turned OFF. | ||
+ | For some totally unknown reason, this caused major problems with an early | ||
+ | attempt of mine to fix Stunt Car Racer -- I am told that the old version | ||
+ | would slow down to just double-speed, | ||
+ | mirroring back on apparently fixes up the problem. | ||
+ | and hence did not have this problem). | ||
+ | is worthwhile to invest a few minutes in trying different configurations. | ||
+ | |||
+ | Finally, there are other strange problems that can arise. | ||
+ | example, I have two 128s: one is a flat 128, one a 128D. With my 128D, | ||
+ | if $D030 is set then the SCPU sometimes -- but not always -- freaks out | ||
+ | and locks up. The flat 128 does not have this problem. | ||
+ | is important is that many decompressors INC $D030 to enable 2MHz mode. | ||
+ | A simple BIT ($2C) fixes this problem up, but the point is that the SCPU has | ||
+ | to interact with the computer, so perhaps that interaction can lead to | ||
+ | problems in obscure cases. | ||
+ | |||
+ | Now, if the goal is to CMD-fix the program, there may be a few | ||
+ | disk-related things that may need fixing. | ||
+ | (or possibly fixing up) any fastloaders, | ||
+ | drive #8 is the only drive in town. Also, if the program uses a track-based | ||
+ | loader (instead of a file-based loader), then that will need to fixed up | ||
+ | as well, and any disk-based copy protection will have to be removed. | ||
+ | |||
+ | There' | ||
+ | program really busted? | ||
+ | with the SCPU, chances are that you saw no speed improvement. | ||
+ | not? It turns out that most chess programs use a timer-based search | ||
+ | algorithm -- changing the playing strength changes the amount of | ||
+ | time the program spends searching, and not the depth of the search. | ||
+ | (The reason is to make the gameplay flow a little better -- otherwise | ||
+ | you have very slow play at the beginning, when there are many more | ||
+ | moves to consider). | ||
+ | right with the SCPU, it is actually working quite well. | ||
+ | |||
+ | And that pretty much covers the basic ideas. | ||
+ | I fixed up was Stunt Car Racer. | ||
+ | |||
+ | Stunt Car Racer | ||
+ | --------------- | ||
+ | |||
+ | Stunt Car Racer, in case you don't know, is a 3D driving game, | ||
+ | and quite fun. It is also too fast, unplayably fast, at 20MHz. | ||
+ | it needs to be slowed down! | ||
+ | My first plan... well, suffice to say that most of my original | ||
+ | plans were doomed to failure, either from being a bad idea, or from | ||
+ | poor implementation. | ||
+ | needed, though, in the main loop, or perhaps by intercepting the joystick | ||
+ | reading routine. | ||
+ | The program has a main loop and an interrupt loop as well. | ||
+ | The interrupt handles the display and other things, and all of the | ||
+ | game calculations are done in the main loop, which flows like | ||
+ | |||
+ | Do some calculations | ||
+ | Draw buffer 1 | ||
+ | Swap buffers | ||
+ | Do some calculations | ||
+ | Draw buffer 2 | ||
+ | Swap buffers | ||
+ | JMP loop | ||
+ | |||
+ | One of my first thoughts was to intercept the joystick I/O, which is | ||
+ | easy to find by hunting for LDA $DC01 (or DC00, whichever joystick | ||
+ | is used). | ||
+ | the memory was safe, and possibly because it was in the interrupt routine | ||
+ | (I simply don't remember). | ||
+ | Before patching, it is very important to make sure that the | ||
+ | patch will survive, and not interfere with the program, so it is | ||
+ | very important to find an area of memory that is not used by the | ||
+ | program. | ||
+ | unused memory was pretty easy -- I just filled the suspect areas with | ||
+ | a fill byte, ran the program, and checked that memory. | ||
+ | memory areas also aids in saving the file, as un-needed areas don't | ||
+ | need to be saved, or can be cleared out to aid compression. | ||
+ | The first free area of memory I found was at $C000. | ||
+ | out that this is a sprite, though, and so put some garbage on the | ||
+ | screen. | ||
+ | mode but got overwritten in competition mode -- always test your | ||
+ | patches thoroughly! | ||
+ | I found a few little spots in low memory that survived, and placed the | ||
+ | patch there. | ||
+ | nearly all memory. | ||
+ | it only needed to be run once, at the beginning (to turn on mirroring | ||
+ | in v2 SCPUs). | ||
+ | Recall that the main loop has two parts -- one for buffer 1, and | ||
+ | one for buffer 2. The trick is to find some code that is common to both | ||
+ | sections, like a subroutine call: | ||
+ | |||
+ | JSR SomeRoutine | ||
+ | Draw buffer 1 | ||
+ | JSR SomeRoutine | ||
+ | Draw buffer 2 | ||
+ | |||
+ | The patch routine I used was a simple delay loop, redirected from those | ||
+ | two JSRs: | ||
+ | |||
+ | LDX #$FF | ||
+ | CPX $D012 | ||
+ | BNE *-5 | ||
+ | DEX | ||
+ | CPX #$FC | ||
+ | BNE *-10 | ||
+ | JMP SomeRoutine | ||
+ | |||
+ | Of course, this will also slow the program down at 1Mhz; later on I became | ||
+ | smarter about my patches, but this one works pretty well. | ||
+ | To save the game and patches, I simply froze it from AR. Just | ||
+ | saving from the monitor generally failed; the initialization routine | ||
+ | doesn' | ||
+ | involves RLE compression, | ||
+ | fill all unused portions of memory -- temporary areas, bitmaps, etc. | ||
+ | Another thing to do is to set a freeze point at the init routine, | ||
+ | and then JMP there from the monitor. | ||
+ | won't have to look at all the usual freezer ugliness, and at this | ||
+ | point freezing isn't any different than saving from the ML monitor | ||
+ | and RLE-packing the file. Once saved, I tested a few times from the | ||
+ | 64 side, to make sure things worked right. | ||
+ | |||
+ | Whether freezing or saving from the monitor, if the file size | ||
+ | is larger than 202 blocks or so, it can't be loaded on the SCPU without | ||
+ | a special loader -- unless you compress it first. | ||
+ | using pu-crunch for that purpose, but if you want to do it on the 64 | ||
+ | then I recommend using ABCrunch, which works well with the SCPU and | ||
+ | gives about as good compression as you can get without an REU. | ||
+ | |||
+ | The result was stuntcar-scpu, | ||
+ | |||
+ | |||
+ | Rescue on Fractalus | ||
+ | ------------------- | ||
+ | |||
+ | Next on my list was Rescue on Fractalus, an older (and quite cool) | ||
+ | Lucasfilm game that just didn't cut it in the 64 conversion, for a number | ||
+ | of reasons (that perhaps could have been avoided). | ||
+ | versions of the game, one of which doesn' | ||
+ | but I have the older version, which does work. | ||
+ | |||
+ | With a SuperCPU, though, there are a number of problems. | ||
+ | flickers terribly. | ||
+ | it is too slow. Specifically, | ||
+ | not give a convincing illusion of speed or excitement. | ||
+ | protected and uses a track-based fastloader, loaded from disk via B-E, which | ||
+ | also saves the high scores to disk. Clearly, this one is a bigger job: the | ||
+ | display is too fast, the game constants need adjusting, and the highscore code | ||
+ | needs to be replaced by some kernal calls. | ||
+ | |||
+ | The structure of this code is a little different. | ||
+ | handles the (double-buffered) display -- it does all the calculations and | ||
+ | draws to the two buffers. | ||
+ | it swaps buffers, changes the display in different parts of the screen, | ||
+ | reads the joystick, and performs the game updates which change your | ||
+ | position and direction. | ||
+ | doesn' | ||
+ | loop takes care of those). | ||
+ | What does all this mean? First, that the game can be a good ten | ||
+ | steps ahead of the screen, which makes things like targeting very | ||
+ | difficult. | ||
+ | (they go crazy at 20). Third, that things like velocity and turning | ||
+ | rate are rather low, because advancing or turning too quickly would | ||
+ | get the game way out of sync (unfortunately, | ||
+ | for 1MHz, making targeting difficult and movement clunky). | ||
+ | other hand, having the movement in the interrupt is the reason that | ||
+ | the game does not become unplayably fast at 20MHz, and means that | ||
+ | something besides a delay loop is needed. | ||
+ | The interrupt swaps buffers, but the main loop draws them, | ||
+ | and because it draws so quickly it can start clearing and drawing to | ||
+ | the visible buffer. | ||
+ | the buffer swap code in the interrupt, so that the drawing buffer was | ||
+ | always on-screen. | ||
+ | like. | ||
+ | It turned out to be pretty easy to force the main loop to wait | ||
+ | on the interrupt. | ||
+ | intercepting the interrupt loop, the buffer swap code actually | ||
+ | modifies a zero-page variable upon swapping. | ||
+ | has to do is wait on that variable before charging ahead. | ||
+ | made it wait for two frames, because it made the game play a little | ||
+ | better. | ||
+ | |||
+ | Now, how to find the velocity and turn code? Well it takes | ||
+ | a keypress to change the velocity, so by hunting for LDA $DC01, and | ||
+ | tracing back, the routine can be found; at the very least the | ||
+ | affected variables may be found, and hunted for. For example, if | ||
+ | the result is stored in $D0, then you can search for LDA $D0. The | ||
+ | point is to locate the keypress processing code. From there, a little | ||
+ | trial and error (setting freeze points and pressing the velocity key) | ||
+ | locates the piece of code which deals with changing the velocity, and | ||
+ | in particular which variable corresponds to velocity. | ||
+ | there it just takes another hunt for LDA velocity, ADC velocity, etc. | ||
+ | to figure out where the code for updating position and direction is. | ||
+ | In this case, I was pretty sure I had found it, as it went | ||
+ | something like | ||
+ | |||
+ | LDA velocity | ||
+ | LSR | ||
+ | LSR | ||
+ | ADC #$20 | ||
+ | |||
+ | and this was added to the position. | ||
+ | I just changed the ADC, or removed an LSR, to see that the speed changed. | ||
+ | The code for turning left and right and moving up and down was similar, | ||
+ | and again after a little trial and error it was clear what code did | ||
+ | what. Again, it wasn't necessary to totally understand how these | ||
+ | routines worked exactly -- just the general idea of them, in this case | ||
+ | to see that a multiple of the velocity was used to change the position | ||
+ | and orientation of the player. | ||
+ | So, to fix it up, I just changed that multiple -- probably I | ||
+ | NOPed out an LSR above, to basically double the speed, and changed the | ||
+ | turning rates similarly. | ||
+ | not only needed to be playably fast, but also couldn' | ||
+ | high speeds, etc. | ||
+ | |||
+ | But once that was working, all that remained was the highscore | ||
+ | table. | ||
+ | score, and while entering my name froze the program, and figured out | ||
+ | what got stored where. | ||
+ | what was saved to disk. From the original loader, I also knew where | ||
+ | the highscores needed to be loaded to initially (the highscore table | ||
+ | gets copied around a lot -- it doesn' | ||
+ | Figuring out the exact number of bytes to save took a little bit of | ||
+ | effort (saving either too many or too few bytes screws it up), but | ||
+ | from there it was clear what memory needed to be saved. | ||
+ | So all that remained was to add the usual SETLFS etc. kernal | ||
+ | calls, right? | ||
+ | (from $90-$C0) for its own purposes. | ||
+ | program to work with device 9, etc. To get around this, I did two | ||
+ | things. | ||
+ | relevant variables to an unused part of memory -- in particular, I | ||
+ | save the current drive number. | ||
+ | file, I actually copy all zero page variables from $90-$C2 or so | ||
+ | to a temporary location, and then copy them back after saving. | ||
+ | That way there are no worries about altering important locations. | ||
+ | Finding memory for the load/save patch was easy -- I just used | ||
+ | the area which was previously used for the fastload load/save code. | ||
+ | There was enough for the necessary code as well as temporary space | ||
+ | for saving the zero page variables. | ||
+ | |||
+ | Finally, I changed some text from Rescue on Fractalus to | ||
+ | Behind Jaggi Lines, to distinguish it from the original, and that | ||
+ | was that. Works great! | ||
+ | in short, more the game it always should have been. | ||
+ | |||
+ | Stellar 7 | ||
+ | --------- | ||
+ | |||
+ | Finally, I tried my hand at Stellar 7. Stellar 7 had several | ||
+ | problems. | ||
+ | screen after a while, if no keys are pressed. | ||
+ | and so passes very quickly. | ||
+ | of delay is needed. | ||
+ | software delay loops, so that needs fixing. | ||
+ | kernal calls for loading and saving, but is attached to drive #8; | ||
+ | also, my version was split into a bunch of files, and I wanted to | ||
+ | cut the number of files down. | ||
+ | |||
+ | Well, by this time it was all pretty straightforward. | ||
+ | the loader, it was easy to figure out which files went where. | ||
+ | mission and main displays were loaded in when needed, and swapped | ||
+ | into unused parts of memory when not, so I loaded them in and | ||
+ | adjusted the swap variable accordingly -- this left just the highscore | ||
+ | and seven level files. | ||
+ | |||
+ | Finding the delay loops was easy -- I just went to the relevant | ||
+ | sections of code, froze, and took a look at the loops. | ||
+ | basic | ||
+ | |||
+ | :LOOP | ||
+ | LDA $D4 ;Check for keypress | ||
+ | BMI :key | ||
+ | DEX | ||
+ | BNE :LOOP | ||
+ | DEY | ||
+ | BNE :LOOP | ||
+ | DEC counter | ||
+ | BNE :LOOP | ||
+ | :key LDX #$00 | ||
+ | ... | ||
+ | |||
+ | Luckily, all routines were pretty much the same as the above. | ||
+ | interrupt routine is in the $0314 vector, and the same routine is | ||
+ | used during gameplay. | ||
+ | So the patch is very easy at this point. | ||
+ | IRQ code which does a JMP $EA7B to JMP $CE00 | ||
+ | |||
+ | . CE00 $EE INC $CFFF | ||
+ | . CE03 $4C JMP $EA7B | ||
+ | |||
+ | To fix up the keypress routines, the idea is to change the LDA $D0 | ||
+ | into a JSR patch. | ||
+ | trick is to place the LDX #$00 into the patch routine: | ||
+ | |||
+ | . CE06 $20 JSR $CE15 ; | ||
+ | . CE09 $A5 LDA $D4 | ||
+ | . CE0B $10 BPL $CE11 | ||
+ | . CE0D $A2 LDX #$00 ;If key pressed, then LDX #$00 | ||
+ | . CE0F $29 AND #$FF | ||
+ | . CE11 $60 RTS | ||
+ | |||
+ | The actual delay is accomplished by waiting on $CFFF: | ||
+ | |||
+ | . CE15 $AD LDA $CFFF | ||
+ | . CE18 $C9 CMP #$04 | ||
+ | . CE1A $90 BCC $CE15 | ||
+ | . CE1C $A9 LDA #$00 | ||
+ | . CE1E $8D STA $CFFF | ||
+ | . CE21 $60 RTS | ||
+ | |||
+ | As you can see, I waited a (default) of 4 frames. | ||
+ | game/ | ||
+ | the rendering code to basically JSR $CE15. | ||
+ | try something new: let the user be able to change that CMP #$04 | ||
+ | to make things faster or slower, to suit their tastes. | ||
+ | values were pretty easy to figure out, so this just required a little | ||
+ | patch to check for the " | ||
+ | accordingly. | ||
+ | |||
+ | Well, that about sums it up. Perhaps if you do some fixing, | ||
+ | you might send me a little email describing your own experiences? | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | An Optimizing Hybrid LZ77 RLE Data Compression Program, aka | ||
+ | Improving Compression Ratio for Low-Resource Decompression | ||
+ | =========================================================== | ||
+ | by Pasi Ojala < | ||
+ | |||
+ | Short: | ||
+ | |||
+ | Pucrunch is a Hybrid LZ77 and RLE compressor, uses an Elias Gamma Code for | ||
+ | lengths, mixture of Gamma Code and linear for LZ77 offset, and ranked RLE | ||
+ | bytes indexed by the same Gamma Code. Uses no extra memory in decompression. | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Introduction | ||
+ | ------------ | ||
+ | |||
+ | Since I started writing demos for the C64 in 1989 I have always wanted to | ||
+ | program a compression program. | ||
+ | time, urge or knowledge to create one. In retrospect, most of the ideas I | ||
+ | had then were simply bogus (" | ||
+ | puts it). But years passed, I gathered more knowledge and finally got an | ||
+ | irresistible urge to finally realize my dream. | ||
+ | |||
+ | The nice thing about the delay is that I don't need to write the actual | ||
+ | compression program to run on a C64 anymore. | ||
+ | ANSI-C code and just program it to create files that would uncompress | ||
+ | themselves when run on a C64. Running the compression program outside of | ||
+ | the target system provides at least the following advantages. | ||
+ | * I can use portable ANSI-C code. The compression program can be | ||
+ | compiled to run on a Unix box, Amiga, PC etc. And I have all the | ||
+ | tools to debug the program and gather profiling information to see | ||
+ | why it is so slow :-) | ||
+ | * The program runs much faster than on C64. If it is still slow, there | ||
+ | is always multitasking to allow me to do something else while I'm | ||
+ | compressing a file. | ||
+ | * There is ' | ||
+ | possibly need and use every trick possible to increase the compression | ||
+ | ratio as long as the decompression remains possible on a C64. | ||
+ | * Large files can be compressed as easily as shorter files. | ||
+ | compressors can't handle files larger than around 52-54 kilobytes | ||
+ | (210-220 disk blocks). | ||
+ | * Cross-development is easier because you don't have to transfer a | ||
+ | file into C64 just to compress it. | ||
+ | |||
+ | |||
+ | Memory Refresh and Terms for Compression | ||
+ | ---------------------------------------- | ||
+ | |||
+ | Statistical compression | ||
+ | Uses the uneven probability distribution of the source symbols | ||
+ | to shorten the average code length. | ||
+ | code belong to this group. | ||
+ | | ||
+ | the symbols decreases. | ||
+ | | ||
+ | less time to send the message. | ||
+ | |||
+ | Dictionary compression | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | than the string itself, we get compression. | ||
+ | the way BASIC substitutes tokens for keywords: | ||
+ | | ||
+ | | ||
+ | be encoded as VICI(3,3) -- the repeated occurrence of the | ||
+ | | ||
+ | |||
+ | Run-length encoding | ||
+ | | ||
+ | | ||
+ | a .zero keyword or equivalent to fill a number of bytes with | ||
+ | zero without needing to list them all in the source code. | ||
+ | |||
+ | Variable-length code | ||
+ | Any code where the length of the code is not explicitly known | ||
+ | but changes depending on the bit values. | ||
+ | | ||
+ | code (uniquely decodable). | ||
+ | where you know you get the next letter by reading a full byte from | ||
+ | the input. | ||
+ | data to know how many bits to read next. | ||
+ | |||
+ | Universal codes | ||
+ | | ||
+ | need to know the maximum value. | ||
+ | get shorter codes. | ||
+ | | ||
+ | Elias Gamma and Delta codes, Fibonacci code, and Golomb and | ||
+ | Rice codes. | ||
+ | |||
+ | Lossless compression | ||
+ | | ||
+ | the original contents unlike lossy compression, | ||
+ | | ||
+ | | ||
+ | |||
+ | My goal in the pucrunch project was to create a compression system in which | ||
+ | the decompressor would use minimal resources (both memory and processing | ||
+ | power) and still have the best possible compression ratio. | ||
+ | would be if it outperformed every other compression program available. | ||
+ | These understandingly opposite requirements (minimal resources and good | ||
+ | compression ratio) rule out most of the state-of-the-art compression | ||
+ | algorithms and in effect only leave RLE and LZ77 to be considered. | ||
+ | goal was to learn something about data compression and that goal at least | ||
+ | has been satisfied. | ||
+ | |||
+ | I started by developing a byte-aligned LZ77+RLE compressor/ | ||
+ | then added a Huffman backend to it. The Huffman tree took 384 bytes and | ||
+ | the code that decoded the tree into an internal representation took 100 | ||
+ | bytes. | ||
+ | 40-kilobyte test files, the gain was reduced to only about 3% after | ||
+ | accounting the extra code and the Huffman tree. | ||
+ | |||
+ | Then I started a more detailed analysis of the LZ77 offset and length | ||
+ | values and the RLE values and concluded that I would get better compression | ||
+ | by using a variable-length code. I used a simple variable-length code and | ||
+ | scratched the Huffman backend code, as it didn't increase the compression | ||
+ | ratio anymore. | ||
+ | |||
+ | Pucrunch does not use byte-aligned data, and is a bit slower than the | ||
+ | byte-aligned version because of this, but is much faster than the original | ||
+ | version with the Huffman backend attached. | ||
+ | well compression-wise. | ||
+ | LhA, Zip, and GZip in some cases. | ||
+ | ourselves. | ||
+ | |||
+ | To get an improvement to the compression ratio for LZ77, we have only a few | ||
+ | options left. We can improve on the encoding of literal bytes (bytes that | ||
+ | are not compressed), | ||
+ | encode, and shorten the encoding of RLE and LZ77. In the algorithm | ||
+ | presented here all these improvement areas are addressed both collectively | ||
+ | (one change affects more than one area) and one at a time. | ||
+ | |||
+ | 1. By using a variable-length code we can gain compression for even | ||
+ | 2-byte LZ77 matches, which in turn reduces the number of literal | ||
+ | bytes we need to encode. | ||
+ | to get any compression because they use so many bits to identify | ||
+ | the length and offset values, thus making the code longer than the | ||
+ | original bytes would' | ||
+ | 2. By using a new literal byte tagging system which distinguishes | ||
+ | uncompressed and compressed data efficiently we can reduce number | ||
+ | of extra bits needed to make this distinction (the encoding | ||
+ | overhead for literal bytes). | ||
+ | files that do not compress well. | ||
+ | 3. By using RLE in addition to LZ77 we can shorten the encoding for | ||
+ | long byte run sequences and at the same time set a convenient | ||
+ | upper limit to LZ77 match length. | ||
+ | functions: | ||
+ | + we only need to encode integers in a specific range | ||
+ | + we only need to search strings shorter than this limit (if we | ||
+ | find a string long enough, we can stop there) | ||
+ | Short byte runs are compressed either using RLE or LZ77, whichever | ||
+ | gets the best results. | ||
+ | 4. By doing statistical compression (more frequent symbols get | ||
+ | shorter representations) on the RLE bytes (in this case symbol | ||
+ | ranking) we can gain compression for even 2-byte run lengths, | ||
+ | which in turn reduces the number of literal bytes we need to | ||
+ | encode. | ||
+ | 5. By carefully selecting which string matches and/or run lengths to | ||
+ | use we can take advantage of the variable-length code. It may be | ||
+ | advantageous to compress a string as two shorter matches instead | ||
+ | of one long match and a bunch of literal bytes, and it can be | ||
+ | better to compress a string as a literal byte and a long match | ||
+ | instead of two shorter matches. | ||
+ | |||
+ | This document consists of several parts, which are: | ||
+ | * C64 Considerations - Some words about the target system | ||
+ | * Escape codes - A new tagging system for literal bytes | ||
+ | * File format - What are the primaries that are output | ||
+ | * Graph search - How to squeeze every byte out of this method | ||
+ | * String match - An evolution of how to speed up the LZ77 search | ||
+ | * Some results on the target system files | ||
+ | * Results on the Calgary Corpus Test Suite | ||
+ | * The Decompression routine - 6510 code with commentary | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Commodore 64 Considerations | ||
+ | --------------------------- | ||
+ | |||
+ | Our target environment (Commodore 64) imposes some restrictions which we | ||
+ | have to take into consideration when designing the ideal compression | ||
+ | system. | ||
+ | of memory certainly imposes a great challenge, and thus also a great sense | ||
+ | of achievement for good results. | ||
+ | |||
+ | First, we would like it to be able to decompress as big a program as | ||
+ | possible. | ||
+ | low memory (most programs that we want to compress start at address 2049) | ||
+ | and is as short as possible. | ||
+ | extra memory or only very small amounts of it. Extra care must be taken to | ||
+ | make certain that the compressed data is not overwritten during the | ||
+ | decompression before it has been read. | ||
+ | |||
+ | Secondly, my number one personal requirement is that the basic end address | ||
+ | must be correctly set by the decompressor so that the program can be | ||
+ | optionally saved in uncompressed form after decompression (although the | ||
+ | current decompression code requires that you say " | ||
+ | This also requires that the decompression code is system-friendly, | ||
+ | does not change KERNAL or BASIC variables or other structures. | ||
+ | decompressor shouldn' | ||
+ | because these may be corrupted by e.g. X-modem file transfer protocol | ||
+ | (padding bytes may be added). | ||
+ | |||
+ | When these requirements are combined, there is not much selection in where | ||
+ | in the memory we can put the decompression code. There are some locations | ||
+ | among the first 256 addresses (zeropage) that can be used, the (currently) | ||
+ | unused part of the processor stack (0x100..0x1ff), | ||
+ | (0x200..0x258) and the tape I/O buffer plus some unused bytes (0x334-0x3ff). | ||
+ | The screen memory (0x400..0x7ff) can also be used if necessary. | ||
+ | do without the screen memory and the tape buffer, we can potentially | ||
+ | decompress files that are located from 0x258 to 0xffff. | ||
+ | |||
+ | The third major requirement is that the decompression should be relatively | ||
+ | fast. After 10 seconds the user begins to wonder if the program has crashed | ||
+ | or if it is doing anything, even if there is some feedback like border color | ||
+ | flashing. | ||
+ | (instead of full 16 bits) and there should be very little of it per each | ||
+ | decompressed byte. Processor- and memory-intensive algorithms like | ||
+ | arithmetic coding and prediction by partial matching (PPM) are pretty much | ||
+ | out of the question, and that is saying it mildly. | ||
+ | practical alternative. | ||
+ | better than LZ77 and can have a bigger length limit. | ||
+ | incorporate RLE and LZ77 into the same algorithm, we should get the best | ||
+ | features from both. | ||
+ | |||
+ | A part of the decompressor efficiency depends on the format of the | ||
+ | compressed data. Byte-aligned codes, where everything is aligned into byte | ||
+ | boundaries, can be accessed very quickly; non-byte-aligned variable length | ||
+ | codes are much slower to handle, but provide better compression. | ||
+ | byte-aligned codes can still have other data sizes than 8. For example you | ||
+ | can use 4 bits for LZ77 length and 12 bits for LZ77 offset, which preserves | ||
+ | the byte alignment. | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | The New Tagging System | ||
+ | ---------------------- | ||
+ | |||
+ | I call the different types of information my compression algorithm outputs | ||
+ | primaries. | ||
+ | * literal (uncompressed) bytes and escape sequences, | ||
+ | * LZ77 (length, | ||
+ | * RLE (length, | ||
+ | * EOF (end of file marker). | ||
+ | |||
+ | Literal bytes are those bytes that cannot be represented by shorter codes, | ||
+ | unlike a part of previously seen data (LZ77), or a part of a longer | ||
+ | sequence of the same byte (RLE). | ||
+ | |||
+ | Most compression programs handle the selection between compressed data and | ||
+ | literal bytes in a straightforward way by using a prefix bit. If the bit | ||
+ | is 0, the following data is a literal byte (uncompressed). | ||
+ | 1, the following data is compressed. | ||
+ | that non-compressible data will be expanded from the original 8 bits to 9 | ||
+ | bits per byte, i.e. by 12.5 %. If the data isn't very compressible, | ||
+ | overhead consumes all the little savings you may have had using LZ77 or | ||
+ | RLE. | ||
+ | |||
+ | Some other data compression algorithms use a value (using variable-length | ||
+ | code) that indicates the number of literal bytes that follow, but this is | ||
+ | really analogous to a prefix bit, because 1-byte uncompressed data is very | ||
+ | common for modestly compressible files. | ||
+ | like a good idea, but we may be able to do even better. | ||
+ | can come up with. My idea was to somehow use the data itself to mark | ||
+ | compressed and uncompressed data and thus not need any prefix bits. | ||
+ | |||
+ | Let's assume that 75% of the symbols generated are literal bytes. | ||
+ | case it seems viable to allocate shorter codes for literal bytes, because | ||
+ | they are more common than compressed data. This distribution (75% are | ||
+ | literal bytes) suggests that we should use 2 bits to determine whether the | ||
+ | data is compressed or a literal byte. One of the combinations indicates | ||
+ | compressed data, and three of the combinations indicate a literal byte. At | ||
+ | the same time those three values divide the literal bytes into three | ||
+ | distinct groups. | ||
+ | three bit patters we have and what are the literal byte values? | ||
+ | |||
+ | The simplest way is to use a direct mapping. | ||
+ | the two most-significant bits) _from the literal bytes themselves_ to | ||
+ | indicate compressed data. This way no actual prefix bits are needed. | ||
+ | maintain an escape code (which doesn' | ||
+ | compared to the bits, and if they match, compressed data follows. | ||
+ | bits do not match, the rest of the literal byte follows. | ||
+ | literal bytes do not expand at all if their most significant bits do not | ||
+ | match the escape code, and fewer bits are needed to represent the literal | ||
+ | bytes. | ||
+ | |||
+ | Whenever those bits in a literal byte would match the escape code, an | ||
+ | escape sequence is generated. | ||
+ | literal bytes which actually start like the escape code (the top bits | ||
+ | match). | ||
+ | code. This escape sequence looks like | ||
+ | |||
+ | # of escape bits (escape code) | ||
+ | | ||
+ | # of escape bits (new escape bits) | ||
+ | 8-# of escape bits (rest of the byte) | ||
+ | = 8 + 3 + # of escape bits | ||
+ | = 13 for 2-bit escapes, i.e. expands the literal byte by 5 bits. | ||
+ | |||
+ | Read further to see how we can take advantage of the changing escape code. | ||
+ | |||
+ | You may also remember that in the run-length encoding presented in the | ||
+ | previous article two successive equal bytes are used to indicate compressed | ||
+ | data (escape condition) and all other bytes are literal bytes. | ||
+ | technique is used in some C64 packers (RLE) and crunchers (LZ77), the only | ||
+ | difference is that the escape condition is indicated by a fixed byte value. | ||
+ | My tag system is in fact an extension to this. Instead of a full byte, I | ||
+ | use only a few bits. | ||
+ | |||
+ | We assumed an even distribution of the values and two escape bits, so 1/4 | ||
+ | of the values have the same two most significant bits as the escape code. | ||
+ | I call this probability that a literal byte has to be escaped the "hit rate". | ||
+ | Thus, literal bytes expand in average 25% of the time by 5 bits, making the | ||
+ | average length 25% * 13 + 75% * 8 = 9.25. Not surprising, this is longer | ||
+ | than using one bit to tag the literal bytes. | ||
+ | we haven' | ||
+ | change the escape code. Using this feature to its optimum (escape | ||
+ | optimization), | ||
+ | |||
+ | Also, because the distribution of the literal byte values is seldom flat | ||
+ | (some values are more common than others) and there is locality (different | ||
+ | parts of the file only contain some of the possible values), from which we | ||
+ | can also benefit, the actual hit rate is always much smaller than that. | ||
+ | Empirical studies on some test files show that for 2-bit escape codes the | ||
+ | actual realized hit rate is only 1.8-6.4%, while the theoretical maximum | ||
+ | is the already mentioned 25%. | ||
+ | |||
+ | Previously we assumed the distribution of 75% of literal bytes and 25% of | ||
+ | compressed data (other primaries). | ||
+ | bits. For other distributions (differently compressible files, not | ||
+ | necessarily better or worse) some other number of escape bits may be more | ||
+ | suitable. | ||
+ | the value which gives the best overall results. | ||
+ | summarizes the hit rates on the test files for different number of escape | ||
+ | bits. | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | 9.06 8.32 8.15 | ||
+ | |||
+ | As can be seen from the table, the realized hit rates are dramatically | ||
+ | smaller than the theoretical maximum values. | ||
+ | should always select 4-bit (or longer) escapes, because it reduces the hit | ||
+ | rate and presents the minimum overhead for literal bytes. | ||
+ | increasing the number of escape bits also increases the code length of the | ||
+ | compressed data. So, it is a matter of finding the optimum setting. | ||
+ | |||
+ | If there are very few literal bytes compared to other primaries, 1-bit | ||
+ | escape or no escape at all gives very short codes to compressed data, but | ||
+ | causes more literal bytes to be escaped, which means 4 bits extra for each | ||
+ | escaped byte (with 1-bit escapes). | ||
+ | literal bytes, for example a 6-bit escape code causes most of the literal | ||
+ | bytes to be output as 8-bit codes (no expansion), but makes the other | ||
+ | primaries 6 bits longer. | ||
+ | the best number of escape bits, but this can be overridden by the user with | ||
+ | the -e option. | ||
+ | |||
+ | The cases in the example with 1-bit escape code validates the original | ||
+ | suggestion: | ||
+ | results on three of the previous test files (although only slightly). | ||
+ | delenn.bin (1 vs 0.828) the escape system works better. | ||
+ | 1-bit escape code is not selected for any of the files, because 2-bit | ||
+ | escape gives better overall results. | ||
+ | |||
+ | -Note:- for 7-bit ASCII text files, where the top bit is always 0 (like | ||
+ | most of the Calgary Corpus files), the hit rate is 0% for even 1-bit | ||
+ | escapes. | ||
+ | using a prefix bit and 7-bit literals, but does not need separate algorithm | ||
+ | to detect and handle 7-bit literals. | ||
+ | |||
+ | For Calgary Corpus files the number of tag bits per primary (counting the | ||
+ | escape sequences and other overhead) ranges from as low as 0.46 (book1) to | ||
+ | 1.07 (geo) and 1.09 (pic). | ||
+ | in the suite where a simple prefix bit would be better than the escape | ||
+ | system. | ||
+ | |||
+ | In Canterbury Corpus the tag bits per primary ranges from 0.44 | ||
+ | (plrabn12.txt) to 1.09 (ptt5), which is the only one above 0.85 (sum). | ||
+ | The average is 0.61 tag bits per primary. | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Primaries Used for Compression | ||
+ | ------------------------------ | ||
+ | |||
+ | The compressor uses the previously described escape-bit system while | ||
+ | generating its output. | ||
+ | generated primaries, whether it is the correct term or not. You are | ||
+ | welcome to suggest a better term for them. The primaries in this | ||
+ | compression algorithm are: literal byte (and escape sequence), LZ77 | ||
+ | (length, | ||
+ | marker). | ||
+ | |||
+ | If the top bits of a literal byte do not match the escape code, the byte is | ||
+ | output as-is. | ||
+ | new escape code. Other primaries start with the escape code. | ||
+ | |||
+ | The Elias Gamma Code is used extensively. | ||
+ | a unary code (a one-bit preceded by zero-bits) and a binary code part. The | ||
+ | first part tells the decoder how many bits are used for the binary code | ||
+ | part. Being a universal code, it produces shorter codes for small integers | ||
+ | and longer codes for larger integers. | ||
+ | a lot of small integers (there are more short string matches and shorter | ||
+ | equal byte runs than long ones), this reduces the total number of bits | ||
+ | needed. | ||
+ | statistical compression and universal codes. | ||
+ | you only need to keep in mind that small integer value equals short code. | ||
+ | The following discusses the encoding of the primaries. | ||
+ | |||
+ | The most frequent compressed data is LZ77. The length of the match is | ||
+ | output in Elias Gamma code, with " | ||
+ | of 3, " | ||
+ | value follows. | ||
+ | next bit defines whether this is LZ77 or RLE/ | ||
+ | 8-bit LZ77 offset value follows. | ||
+ | 2-byte matches to 1..256.) If the bit is 1, the next bit decides between | ||
+ | escape (0) and RLE (1). | ||
+ | |||
+ | The code for an escape sequence is thus e..e010n..ne....e, | ||
+ | byte, and N is the new escape code. Example: | ||
+ | * We are using 2-bit escapes | ||
+ | * The current escape code is " | ||
+ | * We need to encode a byte 0xca == 0b11001010 | ||
+ | * The escape code and the byte high bits match (both are " | ||
+ | * We output the current escape code " | ||
+ | * We output the escaped identification " | ||
+ | * We output the new escape bits, for example " | ||
+ | optimization) | ||
+ | * We output the rest of the escaped byte " | ||
+ | * So, we have output the string " | ||
+ | |||
+ | When the decompressor receives this string of bits, it finds that the first | ||
+ | two bits match with the escape code, it finds the escape identification | ||
+ | (" | ||
+ | combines it with the old escape code to get a whole byte. | ||
+ | |||
+ | The end of file condition is encoded to the LZ77 offset and the RLE is | ||
+ | subdivided into long and short versions. | ||
+ | better idea about why this kind of encoding is selected. | ||
+ | |||
+ | When I studied the distribution of the length values (LZ77 and short RLE | ||
+ | lengths), I noticed that the smaller the value, the more occurrences. | ||
+ | The following table shows an example of length value distribution. | ||
+ | |||
+ | | ||
+ | 2 | ||
+ | 3-4 | ||
+ | 5-8 492 166 | ||
+ | 9-16 | ||
+ | 17-32 | ||
+ | 33-64 8 15 | ||
+ | |||
+ | The first column gives a range of values. | ||
+ | value (2), the second two values (3 and 4), and so on. The second column | ||
+ | shows how many times the different LZ77 match lengths are used, the last | ||
+ | column shows how many times short RLE lengths are used. The distribution | ||
+ | of the values gives a hint of how to most efficiently encode the values. | ||
+ | We can see from the table for example that values 2-4 are used 3455 times, | ||
+ | while values 5-64 are used only 656 times. | ||
+ | get shorter codes, while the less-used ones can be longer. | ||
+ | |||
+ | Because in each " | ||
+ | than in the preceding one, it almost immediately occurred to me that the | ||
+ | optimal way to encode the length values (decremented by one) is: | ||
+ | |||
+ | Value Encoding | ||
+ | 0000000 | ||
+ | 0000001 | ||
+ | 000001x | ||
+ | 00001xx | ||
+ | 0001xxx | ||
+ | 001xxxx | ||
+ | 01xxxxx | ||
+ | 1xxxxxx | ||
+ | |||
+ | The first column gives the binary code of the original value (with x | ||
+ | denoting 0 or 1, xx 0..3, xxx 0..7 and so on), the second column gives the | ||
+ | encoding of the value(s). | ||
+ | in decimal notation. | ||
+ | |||
+ | The last column summarizes the difference between this code and a 7-bit | ||
+ | binary code. Using the previous encoding for the length distribution | ||
+ | presented reduces the number of bits used compared to a direct binary | ||
+ | representation considerably. | ||
+ | is Elias Gamma Code, only the assignment of 0- and 1-bits in the prefix is | ||
+ | reversed, and in this version the length is limited. | ||
+ | value is selectable between 64 and 256. | ||
+ | |||
+ | So, to recap, this version of the Gamma code can encode numbers from 1 to | ||
+ | 255 (1 to 127 in the example). | ||
+ | from 2, because that is the shortest length that gives us any compression. | ||
+ | These length values are first decremented by one, thus length 2 becomes | ||
+ | " | ||
+ | |||
+ | The distribution of the LZ77 offset values (pointer to a previous | ||
+ | occurrence of a string) is not at all similar to the length distribution. | ||
+ | Admittedly, the distribution isn't exactly flat, but it also isn't as | ||
+ | radical as the length value distribution either. | ||
+ | lower 8 bits (automatically selected or user-selectable between 8 and 12 | ||
+ | bits in the current version) of the offset as-is (i.e. binary code) and | ||
+ | the upper part with my version of the Elias Gamma Code. However, 2-byte | ||
+ | matches always have an 8-bit offset value. | ||
+ | discussed shortly. | ||
+ | |||
+ | Because the upper part can contain the value 0 (so that we can represent | ||
+ | offsets from 0 to 255 with a 8-bit lower part), and the code can't directly | ||
+ | represent zero, the upper part of the LZ77 offset is incremented by one | ||
+ | before encoding (unlike the length values which are decremented by one). | ||
+ | Also, one code is reserved for an end of file (EOF) symbol. | ||
+ | the offset value somewhat, but the loss in compression is negligible. | ||
+ | |||
+ | With the previous encoding 2-byte LZ77 matches would only gain 4 bits (with | ||
+ | 2-bit escapes) for each offset from 1 to 256, and 2 bits for each offset | ||
+ | from 257 to 768. In the first case 9 bits would be used to represent the | ||
+ | offset (one bit for gamma code representing the high part 0, and 8 bits for | ||
+ | the low part of the offset), in the latter case 11 bits are used, because | ||
+ | each " | ||
+ | the previous one. | ||
+ | |||
+ | The first case (offset 1..256) is much more frequent than the second case, | ||
+ | because it saves more bits, and also because the symbol source statistics | ||
+ | (whatever they are) guarantee 2-byte matches in recent history (much better | ||
+ | chance than for 3-byte matches, for example). | ||
+ | for a 2-byte LZ77 match to 8 bits (1..256), we don't lose so much | ||
+ | compression at all, but instead we could shorten the code by one bit. This | ||
+ | one bit comes from the fact that before we had to use one bit to make the | ||
+ | selection "8-bit or longer" | ||
+ | need that select bit anymore. | ||
+ | |||
+ | Or, we can use that select bit to a new purpose to select whether this code | ||
+ | really is LZ77 or something else. Compared to the older encoding (which | ||
+ | I'm not detailing here, for clarity' | ||
+ | complicated to follow, and only slightly easier to describe) the codes for | ||
+ | escape sequence, RLE and End of File are still the same length, but the | ||
+ | code for LZ77 has been shortened by one bit. Because LZ77 is the most | ||
+ | frequently used primary, this presents a saving that more than compensates | ||
+ | for the loss of 2-byte LZ77 matches with offsets 257..768 (which we can no | ||
+ | longer represent, because we fixed the offset for 2-byte matches to use | ||
+ | exactly 8 bits). | ||
+ | |||
+ | Run length encoding is also a bit revised. | ||
+ | can be gained by using the same length encoding for RLE as for LZ77. On | ||
+ | the other hand, we still should be able to represent long repeat counts as | ||
+ | that's where RLE is most efficient. | ||
+ | * short RLE for short (e.g. 2..128) equal byte strings | ||
+ | * long RLE for long equal byte strings | ||
+ | |||
+ | The Long RLE selection is encoded into the Short RLE code. Short RLE only | ||
+ | uses half of its coding space, i.e. if the maximum value for the gamma | ||
+ | code is 127, short RLE uses only values 1..63. | ||
+ | decoder into Long RLE mode and more bits are read to complete the run | ||
+ | length value. | ||
+ | |||
+ | For further compression in RLE we rank all the used RLE bytes (the values | ||
+ | that are repeated in RLE) in the decreasing probability order. | ||
+ | are put into a table, and only the table indices are output. | ||
+ | are also encoded using a variable length code (the same gamma code, | ||
+ | surprise..), | ||
+ | more RLE's with smaller indices, the average code length decreases. | ||
+ | decompression we simply get the gamma code value and then use the value as | ||
+ | an index into the table to get the value to repeat. | ||
+ | |||
+ | Instead of reserving full 256 bytes for the table we only put the top 31 | ||
+ | RLE bytes into the table. | ||
+ | a byte run with a value not in the table we use a similar technique as for | ||
+ | the short/long RLE selection. | ||
+ | means we don't have the value in the table. | ||
+ | select the ' | ||
+ | bits of the value (there are 32 distinct values in the range 32..63). | ||
+ | rest 3 bits of the byte are sent separately. | ||
+ | |||
+ | If you are more than confused, forget everything I said in this chapter and | ||
+ | look at the decompression pseudo-code later in this article. | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Graph Search - Selecting Primaries | ||
+ | ---------------------------------- | ||
+ | |||
+ | In free-parse methods there are several ways to divide the file into parts, | ||
+ | each of which is equally valid but not necessary equally efficient in terms | ||
+ | of compression ratio. | ||
+ | |||
+ | " | ||
+ | |||
+ | " | ||
+ | " | ||
+ | |||
+ | The latter two lines show how the sentence could be encoded using literal | ||
+ | bytes and (offset, length) pairs. | ||
+ | encodings for a single string and they are both valid, i.e. they will | ||
+ | produce the same string after decompression. | ||
+ | there are several possible ways to divide the input into parts. | ||
+ | clever, we will of course select the encoding that produces the shortest | ||
+ | compressed version. | ||
+ | the data compressor decide which primary to generate in each step? | ||
+ | |||
+ | The most efficient way the file can be divided is determined by a sort of a | ||
+ | graph-search algorithm, which finds the shortest possible route from the | ||
+ | start of the file to the end of the file. Well, actually the algorithm | ||
+ | proceeds from the end of the file to the beginning for efficiency reasons, | ||
+ | but the result is the same anyway: | ||
+ | emitted is determined and remembered. | ||
+ | bits or the variable length codes or their parameters) are changed, the | ||
+ | graph search must be re-executed. | ||
+ | |||
+ | " | ||
+ | | ||
+ | 13 | ||
+ | \____/ | ||
+ | 15 | ||
+ | |||
+ | Think of the string as separate characters. | ||
+ | character by paying 8 bits to do so (not shown in the figure), unless the | ||
+ | top bits of the character match with the escape code (in which case you | ||
+ | need more bits to send the character " | ||
+ | contains a string that matches a string starting at the current character | ||
+ | you can jump over the string by paying as many bits as representing the | ||
+ | LZ77 (offset, | ||
+ | from 11 to 15 bits. And the same applies if you have RLE starting at the | ||
+ | character. | ||
+ | start of the file to the end and you have found the optimal encoding. | ||
+ | this case the last characters " sting" would be optimally encoded with | ||
+ | 8(literal " ") + 15(" | ||
+ | bits. | ||
+ | |||
+ | The algorithm can be written either cleverly or not-so. | ||
+ | short-cut compared to a full-blown graph search because we can/need to only | ||
+ | go forwards in the file: we can simply start from the end! Our accounting | ||
+ | information which is updated when we pass each location in the data | ||
+ | consists of three values: | ||
+ | 1. the minimum bits from this location to the end of file. | ||
+ | 2. the mode (literal, LZ77 or RLE) to use to get that minimum | ||
+ | 3. the " | ||
+ | |||
+ | For each location we try to jump forward (to a location we already | ||
+ | processed) one location, LZ77 match length locations (if a match exists), | ||
+ | or RLE length locations (if equal bytes follow) and select the shortest | ||
+ | route, update the tables accordingly. | ||
+ | RLE length of for example 18, we also check jumps 17, 16, 15, ... This | ||
+ | gives a little extra compression. | ||
+ | starting from the " | ||
+ | once. Nothing located after the current location can't change, so there is | ||
+ | never any need to update a location. | ||
+ | |||
+ | To be able to find the minimal path, the algorithm needs the length of the | ||
+ | RLE (the number of the identical bytes following) and the maximum LZ77 | ||
+ | length/ | ||
+ | byte/ | ||
+ | memory-consuming part of the compression. | ||
+ | make the search faster. | ||
+ | article. | ||
+ | optimization can use the cached values. | ||
+ | |||
+ | Then what is the rationale behind this optimization? | ||
+ | are not forced to take every compression opportunity, | ||
+ | ones. The compression community calls this "lazy coding" | ||
+ | selection. | ||
+ | LZ77 match, because in the next position in the file you may have a longer | ||
+ | match. | ||
+ | for it that there is a difference. | ||
+ | significant for variable-length code, but it is there and I was after every | ||
+ | last bit of compression, | ||
+ | |||
+ | Note that the decision-making between primaries is quite simple if a | ||
+ | fixed-length code is used. A one-step lookahead is enough to guarantee | ||
+ | optimal parsing. | ||
+ | location, we output a literal byte and that longer match instead of the | ||
+ | shorter match. | ||
+ | but the main reason is that in fixed-length code it doesn' | ||
+ | you represent a part of data as two matches of lengths 2 and 8 or as | ||
+ | matches of lengths 3 and 7 or as any other possible combination (if matches | ||
+ | of those lengths exist). | ||
+ | and/or a statistical compression backend. | ||
+ | offsets no longer generate equal-length codes. | ||
+ | |||
+ | Note also that most LZ77 compression algorithms need at least 3-byte match | ||
+ | to break even, i.e. not expanding the data. This is not surprising when | ||
+ | you stop to think about it. To gain something from 2-byte matches you need | ||
+ | to encode the LZ77 match into 15 bits. This is very little. | ||
+ | LZ77 compressor would use one bit to select between a literal and LZ77, 12 | ||
+ | bits for moderate offset, and you have 2 bits left for match length. | ||
+ | imagine the rationale to exclude 2-byte matches also include "the potential | ||
+ | savings percentage for 2-byte matches is insignificant" | ||
+ | around this by using the tag system and Elias Gamma Code, and does indeed | ||
+ | gain bits from even 2-byte matches. | ||
+ | |||
+ | After we have decided on what primaries to output, we still have to make | ||
+ | sure we get the best results from the literal tag system. | ||
+ | optimization handles this. In this stage we know which parts of the data | ||
+ | are emitted as literal bytes and we can select the minimal path from the | ||
+ | first literal byte to the last in the same way we optimized the primaries. | ||
+ | Literal bytes that match the escape code generate an escape sequence, thus | ||
+ | using more bits than unescaped literal bytes, and we need to minimize these | ||
+ | occurrences. | ||
+ | |||
+ | For each literal byte there is a corresponding new escape code which | ||
+ | minimizes the path to the end of the file. If the literal byte's high bits | ||
+ | match the current escape code, this new escape code is used next. The | ||
+ | escape optimization routine proceeds from the end of the file to the | ||
+ | beginning like the graph search, but it proceeds linearly and is thus much | ||
+ | faster. | ||
+ | |||
+ | I already noted that the new literal byte tagging system exploits the | ||
+ | locality in the literal byte values. | ||
+ | the bytes, the tagging system does not do well at all. Most of the time, | ||
+ | however, the system works very well, performing 50% better than the | ||
+ | prefix-bit approach. | ||
+ | |||
+ | The escape optimization routine is currently very fast. A little | ||
+ | algorithmic magic removed a lot of code from the original version. | ||
+ | escape optimization routine is quite advantageous, | ||
+ | escape bits can now vary from 0 (uncompressed bytes always escaped) to 8 | ||
+ | and we need to run the routine again if we change the number of escape bits | ||
+ | used to select the optimal escape code changes. | ||
+ | |||
+ | Because escaped literal bytes actually expand the data, we need a safety | ||
+ | area, or otherwise the compressed data may get overwritten by the | ||
+ | decompressed data before we have used it. Some extra bytes need to be | ||
+ | reserved for the end of file marker. | ||
+ | many bytes we need for safety buffer by keeping track of the difference | ||
+ | between input and output sizes while creating the compressed file. | ||
+ | |||
+ | $1000 .. $2000 | ||
+ | |OOOOOOOO| | ||
+ | |||
+ | $801 .. | ||
+ | |D|CCCCC| | ||
+ | |||
+ | | ||
+ | | ||
+ | ^ ^ | ||
+ | W R | ||
+ | |||
+ | If the original file is located at $1000-$1fff, | ||
+ | area is 16 bytes, the compressed version will be copied by the | ||
+ | decompression routine higher in memory so that the last byte is at $200f. | ||
+ | In this way, the minimum amount of other memory is overwritten by the | ||
+ | decompression. | ||
+ | wrap buffer. | ||
+ | pointer wraps from the end of memory to the wrap buffer, allowing the | ||
+ | original file to extend up to the end of the memory, all the way to $ffff. | ||
+ | You can get the compression program to tell you which memory areas it uses | ||
+ | by specifying the " | ||
+ | than a dozen bytes. | ||
+ | |||
+ | To sum things up, Pucrunch operates in several steps: | ||
+ | 1. Find RLE and LZ77 data, pre-select RLE byte table | ||
+ | 2. Graph search, i.e. which primaries to use | ||
+ | 3. Primaries/ | ||
+ | 4. Escape optimization, | ||
+ | 5. Update RLE ranks and the RLE byte table | ||
+ | 6. Determine the safety area size and output the file. | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | String Match Speedup | ||
+ | -------------------- | ||
+ | |||
+ | To be able to select the most efficient combination of primaries we of | ||
+ | course first need to find out what kind of primaries are available for | ||
+ | selection. | ||
+ | the file doesn' | ||
+ | string matching is the most time-consuming operation in LZ77 compression | ||
+ | simply because of the amount of the comparison operations needed. | ||
+ | improvement in the match algorithm can decrease the compression time | ||
+ | considerably. | ||
+ | |||
+ | The RLE search is straightforward and fast: loop from the current position | ||
+ | (P) forwards in the file counting each step until a different-valued byte | ||
+ | is found or the end of the file is reached. | ||
+ | the RLE byte count (if the graph search decides to use RLE). The code can | ||
+ | also be optimized to initialize counts for all locations that belonged to | ||
+ | the RLE, because by definition there are only one-valued bytes in each one. | ||
+ | Let us mark the current file position by P. | ||
+ | |||
+ | | ||
+ | int top = inlen - P; | ||
+ | int rlelen = 1; | ||
+ | |||
+ | /* Loop for the whole RLE */ | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | |||
+ | With LZ77 we can't use the same technique as for RLE (i.e. using the | ||
+ | information about current match to skip subsequent file locations to speed | ||
+ | up the search). | ||
+ | -nearest- possible, string that matches the bytes starting from the current | ||
+ | location. | ||
+ | offset from the current position. | ||
+ | |||
+ | Naively, we could start comparing the strings starting from P-1 and P, | ||
+ | remembering the length of the matching part and then doing the same at P-2 | ||
+ | and P, P-3 and P, .. P-j and P (j is the maximum search offset). | ||
+ | longest match and its location (offset from the current position) are then | ||
+ | remembered and initialized. | ||
+ | maximum length we can actually use, we can stop the search there. | ||
+ | code used to represent the length values may have an upper limit.) | ||
+ | |||
+ | This may be the first implementation that comes to your (and my) mind, and | ||
+ | might not seem so bad at first. | ||
+ | the search: the -Brute Force- method. | ||
+ | byte compares to process a file of the length n (a mathematically inclined | ||
+ | person would probably give a better estimate). | ||
+ | determined RLE value to our advantage permits us to rule out the worst-case | ||
+ | projection, which happens when all bytes are the same value. | ||
+ | search LZ77 matches if the current file position has shorter RLE sequence | ||
+ | than the maximum LZ77 copy length. | ||
+ | |||
+ | The first thing I did to improve the speed is to remember the position | ||
+ | where each byte has last been seen. A simple 256-entry table handles that. | ||
+ | Using this table, the search can directly start from the first potential | ||
+ | match, and we don't need to search for it byte-by-byte anymore. | ||
+ | is continually updated when we move toward to the end of the file. | ||
+ | |||
+ | That didn't give much of an improvement, | ||
+ | 256*256 entries, making it possible to locate the latest occurrence of any | ||
+ | byte -pair- instead. | ||
+ | contents directly gives the position in file where these two bytes were | ||
+ | last seen. Because the shortest possible string that would offer any | ||
+ | compression (for my encoding of LZ77) is two bytes long, this byte-pair | ||
+ | history is very suitable indeed. | ||
+ | 2-byte) match is found directly from the byte-pair history. | ||
+ | moderate 30% decrease in compression time for one of my test files (from 28 | ||
+ | minutes to 17 minutes on a 25 MHz 68030). | ||
+ | |||
+ | The second idea was to quickly discard the strings that had no chance of | ||
+ | being longer matches than the one already found. | ||
+ | (sort of a checksum here, it is never used to index a hash table in this | ||
+ | algorithm, but I rather use "hash value" than " | ||
+ | from each three bytes of data. The values are calculated once and put into | ||
+ | a table, so we only need two memory fetches to know if two 3-byte strings | ||
+ | are different. | ||
+ | bytes differ. | ||
+ | original bytes. | ||
+ | to compare are then .. compared. | ||
+ | two bytes earlier than the longest match so far. If the hash values | ||
+ | differ, there is no chance that the match is longer than the current one. | ||
+ | It may be not even be as long, because one of the two earlier bytes may be | ||
+ | different. | ||
+ | compare has to be done. However, the hash value check already discards a | ||
+ | huge number of candidates and more than generously pays back its own memory | ||
+ | references. | ||
+ | (from 17 minutes to 8 minutes). | ||
+ | |||
+ | Okay, the byte-pair table tells us where the latest occurrence of any byte | ||
+ | pair is located. | ||
+ | have to do a brute force search. | ||
+ | byte-pair table to generate a linked list of the byte pairs with the same | ||
+ | value. | ||
+ | using the same indexing as the file positions. | ||
+ | occurrence of a 2-byte string starting at location P, look at backSkip[P]. | ||
+ | |||
+ | /* Update the two-byte history & backSkip */ | ||
+ | | ||
+ | { | ||
+ | int index = (indata[P]<< | ||
+ | |||
+ | | ||
+ | | ||
+ | } | ||
+ | |||
+ | Actually the values in the table are one bigger than the real table | ||
+ | indices. | ||
+ | represent non-negative values), and I wanted zero to mean "not occurred" | ||
+ | |||
+ | This table makes the search of the next (previous) location to consider | ||
+ | much faster, because it is a single table reference. | ||
+ | was reduced from 6 minutes to 1 minute 10 seconds. | ||
+ | from the original 28 minutes! | ||
+ | |||
+ | | ||
+ | | ||
+ | \/ | ||
+ | | ||
+ | ^ ^ ^ | ||
+ | | | | | ||
+ | next | ||
+ | | ||
+ | C B A | ||
+ | |||
+ | In this example we are looking at the string " | ||
+ | the lastPair[] table (with the index " | ||
+ | location A) we can jump directly to the latest match at B, which is " | ||
+ | bytes long. The hash values for the string at B (" | ||
+ | are compared. | ||
+ | bytes), and the strings " | ||
+ | length 3 is found (the 4th byte differs). | ||
+ | index B gives the previous location where the 2-byte string " | ||
+ | found, i.e. C. The hash value for the strategic position of the string in | ||
+ | the current position A (" | ||
+ | corresponding position in the next potential match starting at C (" | ||
+ | They don't match, so the string starting at C (" | ||
+ | longer match than the current longest match at B. | ||
+ | |||
+ | There is also another trick that takes advantage of the already determined | ||
+ | RLE lengths. | ||
+ | we can directly skip to the next potential match. | ||
+ | (the data bytes) are the same, and need not be compared, because the first | ||
+ | byte (two bytes) are always equal on both positions (our backSkip[] table | ||
+ | guarantees that). | ||
+ | of the strings when comparing them. | ||
+ | |||
+ | Another improvement to the search code made it dramatically faster than | ||
+ | before on highly redundant files (such as pic from the Calgary Corpus | ||
+ | Suite, which was the Achilles' | ||
+ | method just skips over the RLE part (if any) in the search position and then | ||
+ | checks if the located position has equal number (and value) of RLE bytes | ||
+ | before it. | ||
+ | |||
+ | backSkip[] | ||
+ | | ||
+ | \/ \ | ||
+ | | ||
+ | ^ ^ ^ | ||
+ | | | | | ||
+ | i p p+rle[p]-1 | ||
+ | |||
+ | The algorithm searches for a two-byte string which starts at p + rle[p]-1, | ||
+ | i.e. the last rle byte (' | ||
+ | finds such location (simple lastPair[] or backSkip[] lookup), it checks if | ||
+ | the rle in the compare position (i-(rle[p]-1)) is long enough (i.e. the | ||
+ | same number of A's before the B in both places). | ||
+ | hash value check is performed on the strings and if it succeeds, the | ||
+ | brute-force byte-compare is done. | ||
+ | |||
+ | The rationale behind this idea is quite simple. | ||
+ | less matches for " | ||
+ | approach. | ||
+ | there is. | ||
+ | |||
+ | Note that a compression method similar to RLE can be realized using just | ||
+ | LZ77. You just emit the first byte as a literal byte, and output a LZ77 | ||
+ | code with offset 1 and the original RLE length minus 1. You can thus | ||
+ | consider RLE as a special case, which offers tighter encoding of the | ||
+ | necessary information. | ||
+ | bytes, a RLE version providing lengths up to 32 kilobytes is a big | ||
+ | improvement, | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | The Decompression Routine | ||
+ | ------------------------- | ||
+ | |||
+ | Any lossless compression program is totally useless unless there exists a | ||
+ | decompression program which takes in the compressed file and -- using only | ||
+ | that information -- generates the original file exactly. | ||
+ | decompression program must run on C64's 6510 microprocessor, | ||
+ | impact on the algorithm development also. Regardless of the algorithm, | ||
+ | there are several requirements that the decompression code must satisfy: | ||
+ | 1. Correctness - the decompression must behave accurately to | ||
+ | guarantee lossless decompression | ||
+ | 2. Memory usage - the less memory is used the better | ||
+ | 3. Speed - fast decompression is preferred to slower one | ||
+ | |||
+ | The latter two requirements can be and are complementary. | ||
+ | faster decompression for the same algorithm is possible if more memory can | ||
+ | be used (although in this case the difference is quite small). | ||
+ | the correctness of the result is the most important thing. | ||
+ | |||
+ | A short pseudo-code of the decompression algorithm follows before I go to | ||
+ | the actual C64 decompression code. | ||
+ | |||
+ | copy the decompression code to low memory | ||
+ | copy the compressed data forward in memory so that it isn't | ||
+ | | ||
+ | setup source and destination pointers | ||
+ | | ||
+ | set initial escape code | ||
+ | do forever | ||
+ | get the number of escape bits " | ||
+ | if " | ||
+ | read more bits to complete a byte and output it | ||
+ | else | ||
+ | get Elias Gamma Code " | ||
+ | if " | ||
+ | get 1 bit | ||
+ | if bit is 0 | ||
+ | it is 2-byte LZ77 | ||
+ | get 8 bits for " | ||
+ | copy 2 bytes from " | ||
+ | | ||
+ | else | ||
+ | get 1 bit | ||
+ | if bit is 0 | ||
+ | it is an escaped literal byte | ||
+ | get new escape code | ||
+ | get more bits to complete a byte with the | ||
+ | | ||
+ | use the new escape code | ||
+ | else | ||
+ | it is RLE | ||
+ | get Elias Gamma Code " | ||
+ | if " | ||
+ | it is long RLE | ||
+ | get more bits to complete a byte " | ||
+ | get Elias Gamma Code " | ||
+ | | ||
+ | endif | ||
+ | get Elias Gamma Code " | ||
+ | if " | ||
+ | get 3 more bits to complete " | ||
+ | else | ||
+ | get " | ||
+ | index " | ||
+ | endif | ||
+ | copy " | ||
+ | endif | ||
+ | endif | ||
+ | else | ||
+ | it is LZ77 | ||
+ | get Elias Gamma Code " | ||
+ | if " | ||
+ | end decompression and start program | ||
+ | endif | ||
+ | get 8..12 bits " | ||
+ | | ||
+ | copy " | ||
+ | | ||
+ | endif | ||
+ | endif | ||
+ | end do | ||
+ | |||
+ | The following routine is the pucrunch decompression code. The code runs on | ||
+ | the C64 or C128's C64-mode and a modified version is used for Vic20 and | ||
+ | C16/ | ||
+ | compressor automatically attaches this code to the packet and sets the | ||
+ | different parameters to match the compressed data. I will insert | ||
+ | additional commentary between strategic code points in addition to the | ||
+ | comments that are already in the code. | ||
+ | |||
+ | Note that at this point it is only possible to make the decompression code | ||
+ | shorter by removing features. | ||
+ | wrong, feel free to point it out to me. Tim Rogers < | ||
+ | did manage to snip off 2 bytes, thanks! | ||
+ | you may consider unnecessary. | ||
+ | * No basic end address set: 8 bytes | ||
+ | * No 2 MHz mode set/reset: 6 bytes | ||
+ | * No wrap option: 12 bytes | ||
+ | |||
+ | Actually, if the wrap option is not used, the compressor automatically | ||
+ | selects the shorter decompression code (only for the C64 version). | ||
+ | |||
+ | processor 6502 | ||
+ | |||
+ | BASEND | ||
+ | LZPOS EQU $2d ; temporary, BASEND *MUST* *BE* *UPDATED* at EOF | ||
+ | |||
+ | bitstr | ||
+ | xstore | ||
+ | |||
+ | WRAPBUF EQU $004b ; ' | ||
+ | |||
+ | ORG $0801 | ||
+ | DC.B $0b, | ||
+ | DC.B $9e, | ||
+ | DC.B $31,0,0,0 | ||
+ | |||
+ | sei ; disable interrupts | ||
+ | inc $d030 ; or "bit $d030" if 2MHz mode is not enabled | ||
+ | inc 1 ; Select ALL-RAM configuration | ||
+ | |||
+ | ldx #0 ;** parameter - # of overlap bytes-1 off $ffff | ||
+ | overlap lda $aaaa, | ||
+ | sta WRAPBUF, | ||
+ | dex | ||
+ | bpl overlap | ||
+ | |||
+ | ldx # | ||
+ | packlp | ||
+ | sta block200--1, | ||
+ | dex | ||
+ | bne packlp | ||
+ | |||
+ | ldx # | ||
+ | packlp2 lda block-stack-1, | ||
+ | dc.b $9d ; sta $nnnn,x | ||
+ | dc.w block-stack--1 | ||
+ | dex | ||
+ | bne packlp2 | ||
+ | |||
+ | ldy #$aa ;** parameter SIZE high + 1 (max 255 extra bytes) | ||
+ | cploop | ||
+ | lda $aaaa, | ||
+ | sta $ff00, | ||
+ | txa ;cpx #0 | ||
+ | bne cploop | ||
+ | dec cploop+6 | ||
+ | dec cploop+3 | ||
+ | dey | ||
+ | bne cploop | ||
+ | jmp main | ||
+ | |||
+ | The first part of the code contains a sys command for the basic | ||
+ | interpreter, | ||
+ | ($f7-$1aa) and to the system input buffer ($200-$253). | ||
+ | segment contains byte, bit and Gamma Code input routines and the RLE byte | ||
+ | code table, the former code segment contains the rest. | ||
+ | |||
+ | This code also copies the compressed data forward in memory so that it | ||
+ | won't be overwritten by the decompressed data before we have had a change | ||
+ | to read it. The decompression starts at the beginning and proceeds upwards | ||
+ | in both the compressed and decompressed data. A safety area is calculated | ||
+ | by the compression routine. | ||
+ | temporary data expansion, i.e. for escaped bytes. | ||
+ | for files that extend upto the end of memory, and would otherwise overwrite | ||
+ | the compressed data with decompressed data before it has been read. | ||
+ | |||
+ | This code fragment is not used during the decompression itself. | ||
+ | the code will normally be overwritten when the actual decompression starts. | ||
+ | |||
+ | The very start of the next code block is located inside the zero page and | ||
+ | the rest fills the lowest portion of the microprocessor stack. | ||
+ | page is used to make the references to different variables shorter and | ||
+ | faster. | ||
+ | they are copied with the same copy loop as the rest of the code. | ||
+ | |||
+ | |||
+ | block-stack | ||
+ | #rorg $f7 ; $f7 - ~$1e0 | ||
+ | block-stack- | ||
+ | |||
+ | bitstr | ||
+ | esc dc.b $00 ; ** parameter (saves a byte when here) | ||
+ | |||
+ | OUTPOS = *+1 ; ZP | ||
+ | putch sta $aaaa ; ** parameter | ||
+ | inc OUTPOS | ||
+ | bne 0$ ; Note: beq 0$; rts; 0$: inc OUTPOS+1; | ||
+ | ; $0100 ; | ||
+ | inc OUTPOS+1 | ||
+ | 0$ rts | ||
+ | |||
+ | putch is the subroutine that is used to output the decompressed bytes. | ||
+ | this case the bytes are written to memory. | ||
+ | itself takes 12 cycles (6 for jsr and another 6 for rts), and the routine | ||
+ | is called a lot of times during the decompression, | ||
+ | should be as fast as possible. | ||
+ | save any registers. | ||
+ | instead of indirect indexed or absolute indexed addressing (sta $aaaa | ||
+ | instead of sta ($zz),y or sta $aa00, | ||
+ | need to save+clear+restore the index register value in the routine. | ||
+ | |||
+ | Further improvement in code size and execution speed is done by storing the | ||
+ | instruction that does the absolute addressing to zero page. When the | ||
+ | memory address is incremented we can use zero-page addressing for it too. | ||
+ | On the other hand, the most time is spent in the bit input routine so | ||
+ | further optimization of this routine is not feasible. | ||
+ | |||
+ | |||
+ | newesc | ||
+ | ldx #2 ; ** PARAMETER | ||
+ | jsr getchkf | ||
+ | sta esc | ||
+ | tya ; pre-set the bits | ||
+ | ; Fall through and get the rest of the bits. | ||
+ | noesc ldx #6 ; ** PARAMETER | ||
+ | jsr getchkf | ||
+ | jsr putch ; output the escaped/ | ||
+ | ; Fall through and check the escape bits again | ||
+ | main ldy #0 ; Reset to a defined state | ||
+ | tya ; A = 0 | ||
+ | ldx #2 ; ** PARAMETER | ||
+ | jsr getchkf | ||
+ | cmp esc | ||
+ | bne noesc ; Not the escape code -> get the rest of the byte | ||
+ | ; Fall through to packed code | ||
+ | |||
+ | The decompression code is first entered in main. It first clears the | ||
+ | accumulator and the Y register and then gets the escape bits (if any are | ||
+ | used) from the input stream. | ||
+ | code, we get more bits to complete a byte and then output the result. | ||
+ | the escape bits match, we have to do further checks to see what to do. | ||
+ | |||
+ | jsr getval | ||
+ | sta xstore | ||
+ | cmp #1 ; LEN == 2 ? | ||
+ | bne lz77 ; LEN != 2 -> LZ77 | ||
+ | tya ; A = 0 | ||
+ | jsr get1bit | ||
+ | lsr ; bit -> C, A = 0 | ||
+ | bcc lz77-2 | ||
+ | ;***FALL THRU*** | ||
+ | |||
+ | We first get the Elias Gamma Code value (or actually my independently | ||
+ | developed version). | ||
+ | means a LZ77 code and we jump to the proper routine. | ||
+ | lengths are decremented before encoding, so the code value 1 means the | ||
+ | length is 2. If the length is two, we get a bit to decide if we have LZ77 | ||
+ | or something else. We have to clear the accumulator, | ||
+ | not do that automatically. | ||
+ | |||
+ | If the bit we got (shifted to carry to clear the accumulator) was zero, it | ||
+ | is LZ77 with an 8-bit offset. | ||
+ | decides between RLE and an escaped byte. A zero-bit means an escaped byte | ||
+ | and the routine that is called also changes the escape bits to a new value. | ||
+ | A one-bit means either a short or long RLE. | ||
+ | |||
+ | ; e..e01 | ||
+ | jsr get1bit | ||
+ | lsr ; bit -> C, A = 0 | ||
+ | bcc newesc | ||
+ | ;***FALL THRU*** | ||
+ | |||
+ | ; e..e011 | ||
+ | srle iny ; Y is 1 bigger than MSB loops | ||
+ | jsr getval | ||
+ | sta xstore | ||
+ | cmp #64 ; ** PARAMETER 63-64 -> C clear, 64-64 -> C set.. | ||
+ | bcc chrcode | ||
+ | |||
+ | longrle ldx #2 ; ** PARAMETER | ||
+ | jsr getbits | ||
+ | sta xstore | ||
+ | |||
+ | jsr getval | ||
+ | tay ; Y is 1 bigger than MSB loops | ||
+ | |||
+ | The short RLE only uses half (or actually 1 value less than a half) of the | ||
+ | gamma code range. | ||
+ | there are several values, we already know some bits of the length value. | ||
+ | Depending on the gamma code maximum value we need to get from one to three | ||
+ | bits more to assemble a full byte, which is then used as the less | ||
+ | significant part for the run length count. | ||
+ | the same gamma code we are using everywhere. | ||
+ | 16 kilobytes for the smallest maximum value (-m5) and to the full 64 | ||
+ | kilobytes for the largest value (-m7). | ||
+ | |||
+ | Additional compression for RLE is gained using a table for the 31 | ||
+ | top-ranking RLE bytes. | ||
+ | 31, we use it to index the table. | ||
+ | of the value gives us the 5 most significant bits of the byte to repeat. | ||
+ | In this case we read 3 additional bits to complete the byte. | ||
+ | |||
+ | |||
+ | chrcode jsr getval | ||
+ | tax ; this is executed most of the time anyway | ||
+ | lda table-1, | ||
+ | |||
+ | cpx #32 ; 31-32 -> C clear, 32-32 -> C set.. | ||
+ | bcc 1$ ; 1..31 -> the byte to repeat is in A | ||
+ | |||
+ | ; Not ranks 1..31, -> 111110xxxxx (32..64), get byte.. | ||
+ | txa ; get back the value (5 valid bits) | ||
+ | jsr get3bit | ||
+ | |||
+ | 1$ ldx xstore | ||
+ | inx ; adjust for cpx#$ff;bne -> bne | ||
+ | dorle jsr putch | ||
+ | dex | ||
+ | bne dorle ; xstore 0..255 -> 1..256 | ||
+ | deym | ||
+ | bne dorle ; Y was 1 bigger than wanted originally | ||
+ | mainbeq beq main ; reverse condition -> jump always | ||
+ | |||
+ | After deciding the repeat count and decoding the value to repeat we simply | ||
+ | have to output the value enough times. | ||
+ | and the Y register holds the upper part of the count. | ||
+ | is first incremented by one to change the code sequence dex ; cpx #$ff ; | ||
+ | bne dorle into simply dex ; bne dorle. | ||
+ | one byte in the decompression code and two clock cycles for each byte that | ||
+ | is outputted. | ||
+ | |||
+ | The next code fragment is the LZ77 decode routine and it is used in the | ||
+ | file parts that do not have equal byte runs (and even in some that have). | ||
+ | The routine simply gets an offset value and copies a sequence of bytes from | ||
+ | the already decompressed portion to the current output position. | ||
+ | |||
+ | |||
+ | lz77 jsr getval | ||
+ | cmp #127 ; ** PARAMETER | ||
+ | beq eof ; EOF | ||
+ | |||
+ | sbc #0 ; C is clear -> subtract 1 (1..126 -> 0..125) | ||
+ | ldx #0 ; ** PARAMETER (more bits to get) | ||
+ | jsr getchkf | ||
+ | |||
+ | lz77-2 | ||
+ | ldx #8 | ||
+ | jsr getbits | ||
+ | ; Note: Already eor:ed in the compressor.. | ||
+ | ;eor #255 ; offset LSB 2's complement -1 (i.e. -X = ~X+1) | ||
+ | adc OUTPOS | ||
+ | sta LZPOS | ||
+ | |||
+ | lda OUTPOS+1 | ||
+ | sbc LZPOS+1 | ||
+ | sta LZPOS+1 | ||
+ | ;ldy #0 ; Y was 0 originally, we don't change it | ||
+ | |||
+ | ldx xstore | ||
+ | inx ; adjust for cpx#$ff;bne -> bne | ||
+ | lzloop | ||
+ | jsr putch | ||
+ | iny ; Y does not wrap because X=0..255 and Y initially 0 | ||
+ | dex | ||
+ | bne lzloop | ||
+ | beq mainbeq | ||
+ | |||
+ | There are two entry-points to the LZ77 decode routine. | ||
+ | (lz77) is for copy lengths bigger than 2. The second entry point (lz77-2) | ||
+ | is for the length of 2 (8-bit offset value). | ||
+ | |||
+ | |||
+ | ; EOF | ||
+ | eof lda #$37 ; ** could be a PARAMETER | ||
+ | sta 1 | ||
+ | dec $d030 ; or "bit $d030" if 2MHz mode is not enabled | ||
+ | lda OUTPOS | ||
+ | sta BASEND | ||
+ | lda OUTPOS+1 | ||
+ | sta BASEND+1 | ||
+ | cli ; ** could be a PARAMETER | ||
+ | jmp $aaaa ; ** PARAMETER | ||
+ | #rend | ||
+ | block-stack-end | ||
+ | |||
+ | Some kind of a end of file marker is necessary for all variable-length | ||
+ | codes. | ||
+ | the byte count of the original file is used instead, but here a special EOF | ||
+ | condition is more convenient. | ||
+ | maximum gamma code value, we have reached the end of file and must stop | ||
+ | decoding. | ||
+ | mode (for C128) and updates the basic end addresses before allowing | ||
+ | interrupts and jumping to the program start address. | ||
+ | |||
+ | The next code fragment is put into the system input buffer. | ||
+ | are for getting bits from the encoded message (getbits) and decoding the | ||
+ | Elias Gamma Code (getval). | ||
+ | bytes. | ||
+ | the values are used. | ||
+ | |||
+ | |||
+ | block200 | ||
+ | #rorg | ||
+ | block200- | ||
+ | |||
+ | getnew | ||
+ | INPOS = *+1 | ||
+ | lda $aaaa ;** parameter | ||
+ | rol ; Shift in C=1 (last bit marker) | ||
+ | sta bitstr | ||
+ | inc INPOS ; Does not change C! | ||
+ | bne 0$ | ||
+ | inc INPOS+1 | ||
+ | bne 0$ | ||
+ | ; This code does not change C! | ||
+ | lda # | ||
+ | sta INPOS | ||
+ | 0$ pla ; 1 Byte/4 cycles | ||
+ | rts | ||
+ | |||
+ | |||
+ | ; getval : Gets a ' | ||
+ | ; ** Scratches X, returns the value in A ** | ||
+ | getval | ||
+ | txa ; set the top bit (value is 1..255) | ||
+ | 0$ asl bitstr | ||
+ | bne 1$ | ||
+ | jsr getnew | ||
+ | 1$ bcc getchk | ||
+ | inx | ||
+ | cpx #7 ; ** parameter | ||
+ | bne 0$ | ||
+ | beq getchk | ||
+ | |||
+ | ; getbits: Gets X bits from the stream | ||
+ | ; ** Scratches X, returns the value in A ** | ||
+ | get1bit inx ;2 | ||
+ | getbits asl bitstr | ||
+ | bne 1$ | ||
+ | jsr getnew | ||
+ | 1$ rol ;2 | ||
+ | getchk | ||
+ | getchkf bne getbits | ||
+ | clc ; | ||
+ | rts ;6+6 | ||
+ | |||
+ | |||
+ | table dc.b 0, | ||
+ | dc.b 0, | ||
+ | dc.b 0, | ||
+ | dc.b 0, | ||
+ | |||
+ | #rend | ||
+ | block200-end | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Target Application Compression Tests | ||
+ | ------------------------------------ | ||
+ | |||
+ | The following data compression tests are made on my four C64 test files: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | The AB Cruncher | ||
+ | | ||
+ | RLE + ByteBoiler (REU) 26654 64.2% | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | The AB Cruncher | ||
+ | | ||
+ | | ||
+ | | ||
+ | RLE + ByteBoiler (REU) 19838 42.1% | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | The AB Cruncher | ||
+ | | ||
+ | | ||
+ | RLE + ByteBoiler (REU) 12478 26.5% | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | The AB Cruncher | ||
+ | | ||
+ | | ||
+ | RLE + ByteBoiler (REU) 9813 20.8% | ||
+ | | ||
+ | gzip -9 | ||
+ | | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Calgary Corpus Suite | ||
+ | -------------------- | ||
+ | |||
+ | The original compressor only allows files upto 63 kB. To be able to | ||
+ | compare my algorithm to others I modified the compressor to allow bigger | ||
+ | files. | ||
+ | suite. | ||
+ | |||
+ | Note that the decompression code is included in the compressed files, | ||
+ | although it is not valid for files over 63k (compressed or uncompressed | ||
+ | size). | ||
+ | bytes) is 6510 machine language. | ||
+ | |||
+ | To tell you the truth, the results surprised me, because the compression | ||
+ | algorithm -IS- developed for a very special case in mind. It only has a | ||
+ | fixed code for LZ77/RLE lengths, not even a static one (fixed != static != | ||
+ | adaptive)! | ||
+ | the literal bytes. | ||
+ | somewhat handicaps my compressor, although the new tagging system is very | ||
+ | happy with 7-bit ASCII input. | ||
+ | uses no extra memory. | ||
+ | |||
+ | I'm getting relatively near LhA, and shorter than LhA for 8 files (300-byte | ||
+ | decompressor included!), and relatively near or shorter than LhA in other | ||
+ | cases if the decompressor is removed. | ||
+ | |||
+ | The table contains the file name (file), compression options (options), the | ||
+ | original file size (in) and the compressed file size (out) in bytes, | ||
+ | average number of bits used to encode one byte (b/B), remaining size | ||
+ | (ratio) and the reduction (gained), and the time used for compression. | ||
+ | comparison, the last three columns show the compressed sizes for LhA, Zip | ||
+ | and GZip (with the -9 option), respectively. | ||
+ | |||
+ | FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz | ||
+ | Estimated decompression on a C64 (1MHz 6510) 6:47 | ||
+ | file | ||
+ | ========================================================================= | ||
+ | bib -p4 111261 | ||
+ | book1 -p4 768771 | ||
+ | book2 -p4 610856 | ||
+ | geo -p2 102400 | ||
+ | news | ||
+ | obj1 | ||
+ | obj2 246814 | ||
+ | paper1 -p2 | ||
+ | paper2 -p3 | ||
+ | paper3 -p2 | ||
+ | paper4 -p1 -m5 13286 6095 3.68 45.88% 54.12% | ||
+ | paper5 -p1 -m5 11954 5494 3.68 45.96% 54.04% | ||
+ | paper6 -p2 | ||
+ | pic -p1 513216 | ||
+ | progc -p1 | ||
+ | progl -p1 | ||
+ | progp 49379 11820 1.92 23.94% 76.06% | ||
+ | trans -p2 | ||
+ | ------------------------------------------------------------------------- | ||
+ | total 3251493 1089796 2.68 33.52% 66.48% 3:18 | ||
+ | |||
+ | |||
+ | Canterbury Corpus Suite | ||
+ | ----------------------- | ||
+ | |||
+ | The following shows the results on the Canterbury corpus. | ||
+ | quite pleased with the results. | ||
+ | lcet10.txt if you remove the decompression code. | ||
+ | |||
+ | FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz | ||
+ | Estimated decompression on a C64 (1MHz 6510) 6:00 LhA Zip GZip-9 | ||
+ | file opt | ||
+ | ========================================================================== | ||
+ | alice29.txt -p4 152089 | ||
+ | ptt5 -p1 513216 | ||
+ | fields.c | ||
+ | kennedy.xls | ||
+ | sum 38240 13334 2.79 34.87% 65.13% | ||
+ | lcet10.txt | ||
+ | plrabn12.txt-p4 481861 199134 3.31 41.33% 58.67% 43.6 210132 195299 194277 | ||
+ | cp.html | ||
+ | grammar.lsp -m5 | ||
+ | xargs.1 | ||
+ | asyoulik.txt-p4 125179 | ||
+ | -------------------------------------------------------------------------- | ||
+ | total 2810784 802364 2.28 28.55% 71.45% 11:29 | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Conclusions | ||
+ | ----------- | ||
+ | |||
+ | In this article I have presented a compression program which creates | ||
+ | compressed executable files for C64, VIC20 and Plus4/ | ||
+ | can be performed on Amiga, MS-DOS/Win machine or any other machine with a | ||
+ | C-compiler. | ||
+ | resources can be used to compress the data while needing minimal resources | ||
+ | for decompression. This was one of the design requirements. | ||
+ | |||
+ | Two original ideas were presented: | ||
+ | an algorithm using hybrid RLE and LZ77. Also, a detailed explanation of | ||
+ | the LZ77 string match routine and the optima parsing scheme was presented. | ||
+ | |||
+ | The compression ratio and decompression speed is comparable to other | ||
+ | compression programs for Commodore 8-bit computers. | ||
+ | |||
+ | But what are then the real advantages of pucrunch compared to traditional | ||
+ | C64 compression programs in addition to that you can now compress VIC20 and | ||
+ | Plus4/C16 programs? | ||
+ | see some actual user comments. | ||
+ | but I hope he doesn' | ||
+ | Maybe Steve has something to add also? | ||
+ | |||
+ | |||
+ | ---8< | ||
+ | |||
+ | A big advantage is that pucrunch does RLE and LZ in one pass. For demos I | ||
+ | only used a cruncher and did my own RLE routines as it is somewhat annoying | ||
+ | to use an external program for this. These programs require some memory | ||
+ | and ZP-addresses like the cruncher does. So it can easily happen that the | ||
+ | decruncher or depacker interfere with your demo-part, if you didn't know | ||
+ | what memory is used by the depacker. | ||
+ | to care about. | ||
+ | of these restrictions. | ||
+ | |||
+ | * Right, and because pucrunch is designed that way from the start, it can | ||
+ | * get better results with one-pass RLE and LZ than doing them separately. | ||
+ | * On the other hand it more or less requires that you _don' | ||
+ | * file first.. | ||
+ | |||
+ | This is true, we also found that out. We did a part for our demo which had | ||
+ | some tables using only the low-nybble. | ||
+ | with a specific pattern. | ||
+ | but as we tried pucrunch, this became obsolete. | ||
+ | or 50 blocks, with our own RLE and a different cruncher we got 60 blks! | ||
+ | Not bad at all ;) | ||
+ | |||
+ | Not to mention that you have the complete and commented source-code for the | ||
+ | decruncher, so that you can easily change it to your own needs. | ||
+ | not only very flexible, it is also very powerful. | ||
+ | a better job than ByteBoiler+Sledgehammer. | ||
+ | |||
+ | In addition to that pucrunch is of course much faster than crunchers on my | ||
+ | C64, this has not only to do with my 486/66 and the use of an HDD. See, I | ||
+ | use a cross-assembler-system, | ||
+ | the assembled code to my 64, crunch it, and transfer it back to my pc. | ||
+ | Now, it's just a simple command-line and here we go... And not only I can | ||
+ | do this, my friend who has an amiga uses pucrunch as well. This is the | ||
+ | first time we use the same cruncher, since I used to take ByteBoiler, but | ||
+ | my friend didn't have a REU so he had to try another cruncher. | ||
+ | |||
+ | So, if I try to make a conclusion: | ||
+ | flexible (thanks to the source-code). | ||
+ | |||
+ | ---8< | ||
+ | |||
+ | Just for your info... | ||
+ | |||
+ | We won the demo-competition at the Interjam' | ||
+ | crunched ran through the hands of pucrunch... | ||
+ | mentioned in the credits. | ||
+ | KNOOPS/ | ||
+ | So, again, THANKS! | ||
+ | |||
+ | | ||
+ | |||
+ | ---8< | ||
+ | |||
+ | |||
+ | So, what can I possibly hope to add to that, right?:-) | ||
+ | |||
+ | If you have any comments, questions, article suggestions or just a general | ||
+ | hello brewing in your mind, send me mail or visit my homepage. | ||
+ | |||
+ | See you all again in the next issue! | ||
+ | |||
+ | -Pasi | ||
+ | |||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | |||
+ | Appendix: The Log Book | ||
+ | ---------------------- | ||
+ | |||
+ | 5.3.1997 | ||
+ | Tried reverse LZ, i.e. mirrored history buffer. | ||
+ | | ||
+ | | ||
+ | |||
+ | 6.3.1997 | ||
+ | Tried to have a code to use the last LZ copy position (offset | ||
+ | added to the lastly used LZ copy position). | ||
+ | 57 bytes, but in fact the net gain was only 2 bytes | ||
+ | | ||
+ | the long rle codes takes away the rest 30). | ||
+ | |||
+ | 10.3.1997 | ||
+ | | ||
+ | an Elias Gamma Code. Tried Fibonacci code instead, but it was | ||
+ | much worse (~500 bytes on bs.run, ~300 bytes on delenn.run) | ||
+ | | ||
+ | |||
+ | 12.3.1997 | ||
+ | ' | ||
+ | bytes used are ranked, and top 15 are put into a table, which | ||
+ | is indexed by a Elias Gamma Code. Other RLE bytes get a prefix | ||
+ | " | ||
+ | |||
+ | 15.3.1997 | ||
+ | The number of escape bits used is again selectable. | ||
+ | one escape bit for delenn.run gains ~150 bytes. | ||
+ | not selected, automatically selects the number of escape bits | ||
+ | (is a bit slow). | ||
+ | |||
+ | 16.3.1997 | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | 31.3.1997 | ||
+ | Tried to use BASIC ROM as a codebook, but the results were not | ||
+ | so good. For mostly-graphics files there are no long matches -> | ||
+ | no net gain, for mostly-code files the file itself gives a | ||
+ | | ||
+ | | ||
+ | |||
+ | 1.4.1997 | ||
+ | Tried maxlen 128, but it only gained 17 bytes on ivanova.run, | ||
+ | and lost ~15 byte on bs.run. | ||
+ | | ||
+ | |||
+ | 2.4.1997 | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | 5.4.1997 | ||
+ | Tried histogram normalization on LZLEN, but it really did not | ||
+ | gain much of anything, not even counting the mapping table from | ||
+ | index to value that is needed. | ||
+ | |||
+ | 11.4.1997 | ||
+ | 8..14 bit LZPOS base part. Automatic selection. | ||
+ | are gained if the proper selection is done before the LZ/RLELEN | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | 22.4.1997 | ||
+ | Found a way to speed up the almost pathological cases by using | ||
+ | the RLE table to skip the matching beginnings. | ||
+ | |||
+ | 2.5.1997 | ||
+ | | ||
+ | | ||
+ | |||
+ | 25.5.1997 | ||
+ | Made the maximum length adjustable. | ||
+ | 64, 128 and 256 respectively. | ||
+ | | ||
+ | |||
+ | 1.6.1997 | ||
+ | | ||
+ | no time at all. It used a whole lot of time on large escape bit | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | 3.6.1997 | ||
+ | | ||
+ | | ||
+ | All 64k files are compressed under one minute on my 25 MHz | ||
+ | | ||
+ | 19 seconds instead of 7 minutes (200 MHz Pentium w/ FreeBSD). | ||
+ | | ||
+ | | ||
+ | | ||
+ | 24 seconds. | ||
+ | | ||
+ | |||
+ | 6.6.1997 | ||
+ | | ||
+ | |||
+ | 11.6.1997 | ||
+ | Now determines the number of bytes needed for temporary data | ||
+ | | ||
+ | | ||
+ | |||
+ | Also, now it's possible to decompress the files compressed with | ||
+ | the program (must be the same version). | ||
+ | |||
+ | 17.6.1997 | ||
+ | Only checks the lengths that are power of two's in | ||
+ | | ||
+ | worse than checking every length. | ||
+ | | ||
+ | This version (compiled with optimizations on) only spends 27 | ||
+ | | ||
+ | |||
+ | 19.6.1997 | ||
+ | | ||
+ | tight now unless some features are removed) and simultaneously | ||
+ | | ||
+ | |||
+ | 23.6.1997 | ||
+ | | ||
+ | byte (conditional probabilities) to set the probabilities for | ||
+ | | ||
+ | the selection is from 0.0 to 1.58, but even using arithmetic | ||
+ | code to encode it, the original escape system is only 82 bits | ||
+ | worse (ivanova.run), | ||
+ | | ||
+ | | ||
+ | |||
+ | 18.7.1997 | ||
+ | In LZ77 match we now check if a longer match (further away) | ||
+ | | ||
+ | code 2 bits longer. | ||
+ | even longer (2 bits for each magnitude). | ||
+ | part is longer than 8, the extra bits make the code longer if | ||
+ | the length becomes longer than two. | ||
+ | |||
+ | | ||
+ | |||
+ | When generating the output rescans the LZ77 matches. | ||
+ | | ||
+ | match may be found much nearer than the original longer match. | ||
+ | | ||
+ | we get some bits off for each match of this kind. Actually, the | ||
+ | | ||
+ | of it, but it is too much work right now (and would make the | ||
+ | | ||
+ | |||
+ | 29.8.1997 | ||
+ | 4 bytes removed from the decrunch code. I have to thank Tim | ||
+ | | ||
+ | |||
+ | 12.9.1997 | ||
+ | | ||
+ | made the 2 MHz user-selectable and off by default. | ||
+ | |||
+ | 13.9.1997 | ||
+ | Today I found out that most of my fast string matching | ||
+ | | ||
+ | | ||
+ | after all and someone else has had the same idea. :-) However, | ||
+ | using the RLE table to help still seems to be an original idea, | ||
+ | which helps immensely on the worst cases. | ||
+ | their paper on this, so I'll just have to get it and see.. | ||
+ | |||
+ | * [Fenwick and Gutmann, 1994]. | ||
+ | " | ||
+ | | ||
+ | |||
+ | 14.9.1997 | ||
+ | The new decompression code can decompress files from $258 to | ||
+ | $ffff (or actually all the way upto $1002d :-). The drawback | ||
+ | | ||
+ | old decompression code is used if the wrap option is not | ||
+ | | ||
+ | |||
+ | 16.9.1997 | ||
+ | The backSkip table can now be fixed size (64 kWord) instead of | ||
+ | | ||
+ | | ||
+ | just a little slow, as we would need to recreate the backSkip | ||
+ | table again). | ||
+ | bytes in the first place (percentage). | ||
+ | | ||
+ | files than 64kB (the primary target files) the default consumes | ||
+ | less memory. | ||
+ | |||
+ | The hash value compare that is used to discard impossible | ||
+ | | ||
+ | | ||
+ | | ||
+ | found very quickly when the strings are compared starting from | ||
+ | the third charater (the two first characters are equal, because | ||
+ | we have a full hash table). | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | After removing the hash compare my algorithm quite closely | ||
+ | | ||
+ | | ||
+ | read it.) | ||
+ | |||
+ | 14 x inlen + 256 kB of memory is used (with no HASH-COMPARE and | ||
+ | | ||
+ | |||
+ | 18.9.1997 | ||
+ | One byte removed from the decompression code (both versions). | ||
+ | |||
+ | 30.12.1997 | ||
+ | Only records longer matches if they compress better than | ||
+ | | ||
+ | | ||
+ | was " | ||
+ | gives better results on all Calgary Corpus files except " | ||
+ | which loses 101 bytes (0.14% of the compressed size). | ||
+ | |||
+ | An extra check/ | ||
+ | | ||
+ | the original and better. | ||
+ | | ||
+ | table (BACKSKIP-FULL defined). | ||
+ | |||
+ | 21.2.1998 | ||
+ | | ||
+ | into the same program. | ||
+ | |||
+ | 16.3.1998 | ||
+ | | ||
+ | |||
+ | 17.8.1998 | ||
+ | There was a small bug in pucrunch which caused the location | ||
+ | $2c30 to be decremented (dec $2c30 instead of bit $d030) when | ||
+ | run without the -f option. | ||
+ | are now updated. | ||
+ | |||
+ | -------------------------------------------------------------------------- | ||
+ | |||
+ | References | ||
+ | |||
+ | 1. http:// | ||
+ | 2. http:// | ||
+ | 2. http:// | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | VIC-20 Kernal ROM Disassembly Project | ||
+ | Richard Cini | ||
+ | |||
+ | Introduction | ||
+ | In order to put this project into perspective, | ||
+ | history is needed. I received my first Commodore as a gift from my parents | ||
+ | back in 1982. I used Commodore PETs in the school' | ||
+ | Radio Shack Model I's in the local R/S store. It was nice to have one | ||
+ | of my own to hack on, though. | ||
+ | built-in BASIC interpreter. My claim to fame (to my family, at least) was | ||
+ | a BASIC/ | ||
+ | speech synthesizer hardware-software hack. Both worked well, which surprised | ||
+ | my mother, who claims that nothing I ever built worked right. | ||
+ | my computer-of-choice changed, but I never lost my love for the VIC. It is | ||
+ | small, easy to program, has a very capable processor (by 1980's standards) | ||
+ | and decent I/O capability. It's peripherals were varied, if not quirky | ||
+ | (take the 1515 printer, for example), but at least everything worked well | ||
+ | together. | ||
+ | Now, fast forward to 1994. Commodore International failed, | ||
+ | crippled by years of a weak product strategy, squandered opportunities | ||
+ | and the market' | ||
+ | machines as productivity tools. After Commodore' | ||
+ | I wanted to try to purchase the Commodore 8-bit intellectual property, | ||
+ | including the rights to the Kernal and BASIC source code, primarily for | ||
+ | preservation purposes. One of my hobbies is collecting and preserving | ||
+ | obsolete and unsupported computers, accessories, | ||
+ | Since Commodore' | ||
+ | (no surprise there), I embarked on decompiling the Kernal. This project, | ||
+ | although time consuming and rewarding from an informational perspective | ||
+ | was not truly trail-blazing. Many before me probably decompiled parts of | ||
+ | the Kernal in order to gain some understanding as it related to another | ||
+ | project. However, in my research, I don't recall ever seeing complete | ||
+ | recompileable source code. Memory and ROM maps, yes; source code, no. | ||
+ | Marko Makela manages a great Commodore web site that contains lots | ||
+ | of useful information, | ||
+ | the starting point for my work. See http:// | ||
+ | this and much more information. | ||
+ | series of articles is to explain the process and to discuss specific | ||
+ | Kernal routines that may be of interest to C=Hacking readers. Also, where | ||
+ | appropriate, | ||
+ | that were the contemporaries of the VIC, the C64 and the PET, specifically. | ||
+ | One might ask, "Why is this project different from all of the other | ||
+ | resources already available?" | ||
+ | |||
+ | 1. The end result is a fully modifiable and compilable source file. | ||
+ | 2. Only the VIC memory map and ROM location map are available | ||
+ | *because* to date everyone has focused on the C64 as it is the | ||
+ | more functional (and hence, more popular) Commodore of the era. | ||
+ | The same goes for the PET, too. | ||
+ | 3. A far as I know, there was no " | ||
+ | of #2, above. | ||
+ | 4. Because of #3, the fact that I had some spare time, and that I | ||
+ | love my VIC (although I don't use it as much these days), I felt | ||
+ | that I had to do it. | ||
+ | |||
+ | Brief VIC-20 History | ||
+ | -------------------- | ||
+ | Although I'd like to provide a complete history lesson on the VIC-20, many | ||
+ | others before me have done a better job. The March 1985 issue of the | ||
+ | IEEE Spectrum has an article, as does Marko' | ||
+ | http:// | ||
+ | Chuck Peddle, the well-known creator of the 6502 microprocessor. Nonetheless, | ||
+ | I will provide a Readers' | ||
+ | In the late-70s, engineers in the Advanced Systems Design Group | ||
+ | (ASDG) at Commodore created a multi-function video/sound interface chip, | ||
+ | the 6560 (a.k.a., the VIC). Al Charpentier ran the LSI section of the ASDG, | ||
+ | and was the lead in designing the VIC-I, and later, the VIC-II. The ASDG was | ||
+ | the old MOS Semiconductor operation that Commodore bought in the mid-70s. | ||
+ | The 6560 supported complete composite color video, 3-voice plus white noise | ||
+ | sound, a volume control, two analog-to-digital converters that supported | ||
+ | the use of a game paddle or joystick, and a light pen interface. | ||
+ | Feature-rich as it was, no manufacturer wanted to commit a product line to | ||
+ | it. Since Commodore could not find anyone to buy the chip, they decided to | ||
+ | build a computer that featured the chip. Hence, the VIC-20 was born. Al' | ||
+ | buddy, Bob Yannes, was a senior systems designer at Commodore who developed | ||
+ | the VIC-20 (and later, the C64) prototype. The VIC was Commodore' | ||
+ | color computer, and the first designed for home use. | ||
+ | |||
+ | Relationships with other Commodore Products | ||
+ | ------------------------------------------- | ||
+ | The structure of the VIC-20 ROM parallels those of Commodore' | ||
+ | other machines of that era, the PET and the C64. All three machines share | ||
+ | the concept of a standard, public, API accessed through a jump table | ||
+ | located in the last page of the system ROM. | ||
+ | There' | ||
+ | a surprise to this writer, but intuitively makes sense. The C64 and VIC | ||
+ | Kernal ROMs are nearly identically laid-out and contain a lot of common | ||
+ | code. The C64 ROM of course contains certain modifications relating to | ||
+ | its unique hardware and capabilities, | ||
+ | the machines are so similar in design (the same engineer designed them), | ||
+ | the similarity of the code isn't surprising. This recycling appears to | ||
+ | have been a cost-effective way to develop a new computer in record time. | ||
+ | |||
+ | The Process | ||
+ | ----------- | ||
+ | I believe that the reverse engineering process for the purpose of creating | ||
+ | source code that produces binary-identical object code, is fairly standard: | ||
+ | know thy hardware, get the object code, turn object code into assembly code, | ||
+ | give everything names, test compile, fix errors, re-compile and call it done. | ||
+ | However, I'm certain that C=Hacking readers have reverse engineering methods | ||
+ | which differ from mine. I'd be interested in hearing them, as I'm always | ||
+ | looking for a better way to do things. | ||
+ | from my VIC (although a ROM image from funet would work) and cranked it | ||
+ | through a disassembler (I used SuperMon). Since I wasn't too up on | ||
+ | transferring data from the VIC to the PC, I used a brute-force method: | ||
+ | scanning disassembler listings into TIFF files and running them through | ||
+ | an OCR program. This produced plain-text files for me to work with. Then, | ||
+ | I used various available information on the Web and in books (such as " | ||
+ | Revealed", | ||
+ | to break the code into subroutines. I inserted meaningful memory and | ||
+ | program location labels, taking names from all of the above sources. | ||
+ | My assembler allowed me to create conventional data and code " | ||
+ | so I took the time to create a " | ||
+ | VIC memory map. These segments are not to be confused with the segments | ||
+ | supported under the various PeeCee memory models. I used TASM 3.1, a | ||
+ | shareware 8-bit table-based assembler by Squak Valley Software of | ||
+ | Issaquah, WA. TASM supports the 6502, 6800/05/11, 8048/ | ||
+ | TMS7000/ | ||
+ | |||
+ | Next, I read the code and added comments as I went along. I do this on a | ||
+ | routine-by-routine basis, as time permits, but always beginning at the | ||
+ | first instruction after POR. Finally, I did a recompilation and compared | ||
+ | the recompiled output with the ROM image to make sure that no errors were | ||
+ | introduced in the source code creation process. I wasn't so lucky | ||
+ | the first time around :-). I like to make gross checks first: ROM image | ||
+ | size, location of well-known routines (such as the jump table at $FF85, | ||
+ | the POR vector at $FD22, and the individual jump table routines). | ||
+ | This helps to narrow down the location of any errors. | ||
+ | |||
+ | Once I was satisfied that the disassembly was right, I created ROM image to | ||
+ | be burned into a test EPROM. The ROMs used in the VIC are 2364 | ||
+ | mask-programmed ROMs. 2364s are a 24-pin 8k x 8bit ROM device, and the | ||
+ | closest commonly available EPROM is the 2764 EPROM, a 28-pin 8k x 8bit | ||
+ | device. Since the 2364s have four fewer pins (reflecting its non- | ||
+ | programmability), | ||
+ | and two 28-pin wire-wrap DIP socket should do the trick. | ||
+ | If all checks-out, the source is then ready for modification. | ||
+ | |||
+ | Issues and Considerations | ||
+ | ------------------------- | ||
+ | There is one important thing that I discovered while hacking the | ||
+ | Kernal -- there is no wasted space in the Kernal ROM as it presently exists. | ||
+ | Actually, the Kernal occupies 1,279 bytes less than 8k, beginning at | ||
+ | $E500 (and consequently, | ||
+ | 1,279 bytes). The Kernal developers maximized the 8k space, so any Kernal | ||
+ | hack will have to use jumps to a patch area elsewhere in the processor | ||
+ | address space. For example, let's say that I wanted to add BASIC 4.0 disk | ||
+ | commands to the VIC Kernal by hacking it (recognizing that I could have | ||
+ | used the easier wedge method). I could place jumps in the Kernal ROM to | ||
+ | locations within my own non-autostart ROM located at $A000. | ||
+ | important consideration is backward compatibility. Certain programs may rely | ||
+ | on the specific location of code within the Kernal ROM. For example, a | ||
+ | game ROM may make direct calls to the internal Plot routine, as opposed | ||
+ | to using the jump table at the end of the Kernal ROM (saving a few | ||
+ | processor ticks in the process). Shifting code around would relocate that | ||
+ | code, breaking that program. | ||
+ | |||
+ | General Hardware Information | ||
+ | ---------------------------- | ||
+ | The Microprocessor | ||
+ | Manufactured by MOS and second-sourced from Rockwell Semiconductors, | ||
+ | 6502 is an 8-bit, 1MHz NMOS-process processor. It supports 56 instructions | ||
+ | in 13 addressing modes (although six are combinations of the seven basic | ||
+ | modes), three processor registers (.A, .X, and .Y), stack pointer, and a | ||
+ | condition code (flags) register. The 6502 supports both maskable and | ||
+ | non-maskable interrupts with fixed vectors at the top of its 64k address | ||
+ | space. | ||
+ | |||
+ | I/O Processors | ||
+ | VIC-20 I/O is managed by two memory-mapped I/O processors, the | ||
+ | 6522 versatile interface adapter (VIA). The I/O region occupies the | ||
+ | 4k-address space beginning at $9000. | ||
+ | joystick, light pen, cassette deck, the IEEE serial interface and the | ||
+ | user port. Each 6522 has 16 bi-directional I/O lines, four handshaking | ||
+ | lines, two 8-bit shift registers and two clock generators (capable of | ||
+ | generating free-running or triggered pulses). One of these clocks is | ||
+ | responsible for RTC, RS-232, IEEE, and cassette tape timing. Of the eight | ||
+ | handshaking lines, two are free for general use, while the other six are | ||
+ | used for the IEEE serial port, the RESTORE key, and cassette control. | ||
+ | 24-bits of the total 32-bits of I/O are used for keyboard scanning and | ||
+ | joystick, light pen, and serial inputs. Truly available to the user is four | ||
+ | handshaking lines and 8-bits of I/O. | ||
+ | |||
+ | Video Interface | ||
+ | The VIC-20 video interface is managed by the 6560/6561 VIC chip. The | ||
+ | VIC screen is organized in 22 columns by 24 rows in text mode and | ||
+ | 176 by 192 pixels in " | ||
+ | mapping the character generator ROM to RAM and modifying the character | ||
+ | glyphs. | ||
+ | in NTSC (6560) or PAL (6561) formats in 16 colors. The VIC also contains | ||
+ | three programmable tone generators, a white noise source, volume control, | ||
+ | two A/D converters (for game paddle interfacing), | ||
+ | centering, and independent control over background, foreground, and border | ||
+ | colors. | ||
+ | and 8-bit character generator ROM. The address buss is a private 2MHz buss, | ||
+ | which is not shared with the 1MHz microprocessor buss. Shared ROM access | ||
+ | is performed during the processor Phase 1 clock. | ||
+ | |||
+ | Memory Map | ||
+ | Space limitations prohibit listing the complete VIC-20 memory map, | ||
+ | but an abridged version may be helpful: | ||
+ | |||
+ | HEX Offset DESCRIPTION | ||
+ | |||
+ | 0000-00FF Zero page: Kernal and BASIC system areas | ||
+ | 0100-01FF | ||
+ | 0200-02FF | ||
+ | keyboard and screen vars, RS232 vars | ||
+ | 0300-03FF | ||
+ | vectors cassette buffer area | ||
+ | 0400-0FFF Pages 4-15: 3k expansion area | ||
+ | 1000-1DFF User Basic area (unexpanded VIC) | ||
+ | 1E00-1FFF Screen memory (unexpanded VIC) | ||
+ | 2000-3FFF 8K expansion RAM/ROM block 1 | ||
+ | 4000-5FFF 8K expansion RAM/ROM block 2 | ||
+ | 6000-7FFF 8K expansion RAM/ROM block 3 | ||
+ | NOTE: When additional memory is added to block 1, 2 or 3, | ||
+ | the Kernal relocates the following things for BASIC: | ||
+ | 1000-11FF Screen memory | ||
+ | 1200-? | ||
+ | 9400-95FF Color RAM | ||
+ | 8000-8FFF 4K Character generator ROM | ||
+ | 8000-83FF Upper case and graphics | ||
+ | 8400-87FF Reversed upper case and graphics | ||
+ | 8800-8BFF Upper and lower case | ||
+ | 8C00-8FFF Reversed upper and lower case | ||
+ | 9000-93FF I/ | ||
+ | 9000-900F VIC chip registers | ||
+ | 9110-911F 6522 VIA#1 registers | ||
+ | 9120-912F 6522 VIA#2 registers | ||
+ | 9400-95FF location of COLOR RAM with additional RAM at blk 1 | ||
+ | 9600-97FF Normal location of COLOR RAM | ||
+ | 9800-9BFF I/ | ||
+ | 9C00-9FFF I/ | ||
+ | A000-BFFF 8K block for expansion ROM (autostart ROM) | ||
+ | C000-DFFF 8K BASIC ROM | ||
+ | E000-FFFF 8K Kernal ROM | ||
+ | |||
+ | Kernal Functions | ||
+ | ---------------- | ||
+ | |||
+ | System Startup | ||
+ | Let's first take a look at how a VIC-20 boots. The process is | ||
+ | substantially similar for the C64 and the PET (through the commonality | ||
+ | of the microprocessor upon which each machine is based), although the | ||
+ | locations of various routines differs, as does the memory and I/O map. | ||
+ | When power is first applied to the microprocessor, | ||
+ | low by a 555 timer for a period long enough for the power supply and clock | ||
+ | generator to stabilize. The 6502 utilizes the last six bytes of the address | ||
+ | space to store three critical vectors: the NMI, RESET, and IRQ vectors, | ||
+ | respectively. On power-up, the PC (program counter) is initialized to the | ||
+ | address stored at location $FFFC and execution begins at that location | ||
+ | (the first column in the source code represents the program line number): | ||
+ | |||
+ | 6495 FFFA ; | ||
+ | 6496 FFFA ; - Power-on and hardware vectors | ||
+ | 6497 FFFA ; | ||
+ | 6498 FFFA A9 FE .dw NMI ; | ||
+ | 6499 FFFC 22 FD .dw RESET ;POR | ||
+ | 6500 FFFE 72 FF .dw IRQ ;IRQ processor | ||
+ | |||
+ | Execution begins at $FD22, the POR (power-on reset) vector: | ||
+ | |||
+ | 5971 | ||
+ | 5972 | ||
+ | 5973 | ||
+ | 5974 | ||
+ | 5975 FD22 A2 FF LDX #$FF | ||
+ | 5976 FD24 78 SEI ;kill interrupts | ||
+ | 5977 FD25 9A TXS ;set stack top | ||
+ | 5978 FD26 D8 CLD | ||
+ | 5979 FD27 20 3F FD JSR SCNROM ; | ||
+ | 5980 FD2A D0 03 BNE SKIPA0 ;not there, skip ROM init | ||
+ | 5981 | ||
+ | 5982 FD2C 6C 00 A0 JMP (A0BASE) ; | ||
+ | 5983 | ||
+ | 5984 | ||
+ | 5985 FD2F 20 8D FD JSR RAMTAS ; | ||
+ | 5986 FD32 20 52 FD JSR IRESTR ; | ||
+ | 5987 FD35 20 F9 FD JSR IOINIT ; | ||
+ | 5988 FD38 20 18 E5 JSR CINT1 ;init video | ||
+ | 5989 FD3B 58 CLI ; | ||
+ | 5990 FD3C 6C 00 C0 JMP (BENTER) ; | ||
+ | |||
+ | The startup routines setup the processor stack, check for the | ||
+ | existence of an autostart ROM. Autostart ROMs are located in the $A000 | ||
+ | block and have a five-byte signature (A0CBM) at offset $04. If the | ||
+ | signature is found, the Kernal jumps to the A0ROM initialization routine | ||
+ | pointed to by offset $00 of the autostart ROM. | ||
+ | If no signature is found, the Kernal initialization continues by | ||
+ | testing the RAM, initializing the system variables, system hardware, and | ||
+ | the screen. Finally, the Kernal transfers control to the BASIC | ||
+ | initialization entry point at $C000. | ||
+ | |||
+ | Routine SCNROM | ||
+ | The first routine called from the POR code is the SCNROM routine. | ||
+ | This routine looks for the special 5-byte signature that indicates the | ||
+ | presence of an autostart ROM located in the $A segment. | ||
+ | |||
+ | 5992 | ||
+ | 5993 | ||
+ | 5994 | ||
+ | 5995 | ||
+ | 5996 FD3F A2 05 LDX #$05 ;5 chars to compare | ||
+ | 5997 | ||
+ | 5998 | ||
+ | 5999 FD41 BD 4C FD LDA SCANEX, | ||
+ | 6000 FD44 DD 03 A0 CMP A0BASE+3, | ||
+ | 6001 FD47 D0 03 BNE SCANEX ;no match, exit loop | ||
+ | 6002 | ||
+ | 6003 FD49 CA DEX ; | ||
+ | 6004 FD4A D0 F5 BNE SCNLOOP ; | ||
+ | 6005 | ||
+ | 6006 | ||
+ | 6007 FD4C 60 RTS ; | ||
+ | 6008 | ||
+ | 6009 | ||
+ | 6010 | ||
+ | 6011 | ||
+ | 6012 FD4D 4130C3C2CD | ||
+ | |||
+ | Routine RAMTAS | ||
+ | The RAMTAS routine is the second subroutine in the initialization | ||
+ | process. It clears the first three pages of RAM, then searches for expansion | ||
+ | memory. If any is found, the screen memory, color memory, and start of | ||
+ | BASIC RAM pointers are adjusted to their documented alternates. | ||
+ | |||
+ | 6058 | ||
+ | 6059 | ||
+ | 6060 | ||
+ | 6061 | ||
+ | 6062 FD8D A9 00 LDA #$00 ;zero regs .A and .X | ||
+ | 6063 FD8F AA TAX | ||
+ | 6064 | ||
+ | 6065 | ||
+ | 6066 FD90 95 00 STA USRPOK, | ||
+ | 6067 FD92 9D 00 02 STA BUF, | ||
+ | 6068 FD95 9D 00 03 STA ERRVPT, | ||
+ | 6069 FD98 E8 INX | ||
+ | 6070 FD99 D0 F5 BNE RAMTSLP1 ; | ||
+ | 6071 | ||
+ | 6072 FD9B A2 3C LDX #$3C ;setup cassette buffer | ||
+ | 6073 FD9D A0 03 LDY #$03 ;area to $033c | ||
+ | 6074 FD9F 86 B2 STX TAPE1 | ||
+ | 6075 FDA1 84 B3 STY TAPE1+1 | ||
+ | 6076 FDA3 85 C1 STA STAL ;clear I/O start address… | ||
+ | 6077 FDA5 85 97 STA REGSAV ; | ||
+ | 6078 FDA7 8D 81 02 STA OSSTAR ; | ||
+ | 6079 FDAA A8 TAY ; .Y=0 | ||
+ | 6080 FDAB A9 04 LDA #$04 ;check RAM from $0400 | ||
+ | 6081 FDAD 85 C2 STA STAL+1 ;set I/O start to page 3 | ||
+ | 6082 | ||
+ | 6083 | ||
+ | 6084 FDAF E6 C1 INC STAL ; | ||
+ | 6085 FDB1 D0 02 BNE RAMTAS1 ; | ||
+ | 6086 | ||
+ | 6087 FDB3 E6 C2 INC STAL+1 ; | ||
+ | 6088 | ||
+ | 6089 | ||
+ | 6090 FDB5 20 91 FE JSR MEMTST ; | ||
+ | 6091 FDB8 A5 97 LDA REGSAV | ||
+ | 6092 FDBA F0 22 BEQ RAMTAS3 | ||
+ | 6093 FDBC B0 F1 BCS RAMTASLP2 ; | ||
+ | 6094 | ||
+ | 6095 FDBE A4 C2 LDY STAL+1 ; | ||
+ | 6096 FDC0 A6 C1 LDX STAL ; | ||
+ | 6097 FDC2 C0 20 CPY #$20 ; top at $2000 | ||
+ | 6098 FDC4 90 25 BCC I6561LP ; | ||
+ | 6099 | ||
+ | 6100 FDC6 C0 21 CPY #$21 ;RAM at $2000? | ||
+ | 6101 FDC8 B0 08 BCS RAMTAS2 ; | ||
+ | 6102 | ||
+ | 6103 FDCA A0 1E LDY #$1E ;$1E00 | ||
+ | 6104 FDCC 8C 88 02 STY HIPAGE | ||
+ | 6105 | ||
+ | 6106 | ||
+ | 6107 FDCF 4C 7B FE JMP STOTOP ;CLC and set RAM top | ||
+ | 6108 | ||
+ | 6109 | ||
+ | 6110 FDD2 A9 12 LDA #$12 ;With exp. RAM, BASIC starts | ||
+ | 6111 FDD4 8D 82 02 STA OSSTAR+1 ; | ||
+ | 6112 FDD7 A9 10 LDA # | ||
+ | 6113 FDD9 8D 88 02 STA HIPAGE | ||
+ | 6114 FDDC D0 F1 BNE RAMTAS1A ; | ||
+ | 6115 | ||
+ | 6116 | ||
+ | 6117 FDDE 90 CF BCC RAMTASLP2 ; | ||
+ | 6118 | ||
+ | 6119 FDE0 A5 C2 LDA STAL+1 ;get MSB of I/O start | ||
+ | 6120 FDE2 8D 82 02 STA OSSTAR+1 ; | ||
+ | 6121 FDE5 85 97 STA REGSAV ; | ||
+ | 6122 FDE7 C9 11 CMP #$11 ;page $11 | ||
+ | 6123 | ||
+ | 6124 | ||
+ | 6125 FDE9 90 C4 BCC RAMTASLP2 | ||
+ | 6126 | ||
+ | 6127 | ||
+ | 6128 FDEB 20 C3 E5 JSR V6561I-2 ; | ||
+ | 6129 FDEE 4C EB FD JMP I6561LP | ||
+ | |||
+ | This routine actually tests the RAM, and is called during the memory | ||
+ | search loop at $FDB5. It uses a simple walking-bit pattern to test for | ||
+ | memory defects: | ||
+ | |||
+ | 6271 | ||
+ | 6272 | ||
+ | 6273 | ||
+ | 6274 | ||
+ | 6275 FE91 B1 C1 LDA (STAL), | ||
+ | 6276 FE93 AA TAX ;save .A | ||
+ | 6277 FE94 A9 55 LDA # | ||
+ | 6278 FE96 91 C1 STA (STAL), | ||
+ | 6279 FE98 D1 C1 CMP (STAL), | ||
+ | 6280 FE9A D0 08 BNE MEMTS1 ;not equal | ||
+ | 6281 | ||
+ | 6282 FE9C 6A ROR A ; | ||
+ | 6283 FE9D 91 C1 STA (STAL), | ||
+ | 6284 FE9F D1 C1 CMP (STAL), | ||
+ | 6285 FEA1 D0 01 BNE MEMTS1 ;not equal | ||
+ | 6286 FEA3 A9 .db $A9 ;LDA #$18 for OK, $55 or $AA | ||
+ | 6287 | ||
+ | 6288 | ||
+ | 6289 FEA4 18 CLC ;CLC only on error | ||
+ | 6290 FEA5 8A TXA ; | ||
+ | 6291 FEA6 91 C1 STA (STAL), Y ;save it | ||
+ | 6292 FEA8 60 RTS | ||
+ | |||
+ | The RAMTAS routine also calls STOTOP to save the top of RAM pointer: | ||
+ | |||
+ | 6241 | ||
+ | 6242 | ||
+ | 6243 | ||
+ | 6244 | ||
+ | 6245 | ||
+ | 6246 | ||
+ | 6247 FE73 90 06 BCC STOTOP ;set or read? | ||
+ | 6248 FE75 AE 83 02 LDX OSTOP ;get top of memory | ||
+ | 6249 FE78 AC 84 02 LDY OSTOP+1 | ||
+ | 6250 | ||
+ | 6251 | ||
+ | 6252 FE7B 8E 83 02 STX OSTOP ;set top of memory | ||
+ | 6253 FE7E 8C 84 02 STY OSTOP+1 | ||
+ | 6254 FE81 60 RTS | ||
+ | |||
+ | Routine IRESTR | ||
+ | This routine loads (or re-loads) the default Kernal vectors upon | ||
+ | POR or Run-Stop/ | ||
+ | include the links to IRQ, NMI, Open, Close, Channel In, Channel Out, | ||
+ | Clear Channels, Character In, Character Out, Scan Stop Key, Get Keyboard | ||
+ | Character, Close All, Load and Save routines within the Kernal ROM. | ||
+ | |||
+ | 6014 | ||
+ | 6015 | ||
+ | 6016 | ||
+ | 6017 | ||
+ | 6018 | ||
+ | 6019 FD52 A2 EA LDX #$EA ; | ||
+ | 6020 FD54 A0 EA LDY #$EA ; | ||
+ | 6021 FD56 18 CLC | ||
+ | 6022 | ||
+ | 6023 | ||
+ | 6024 | ||
+ | 6025 | ||
+ | 6026 | ||
+ | 6027 | ||
+ | 6028 FD57 86 C3 STX MEMUSS ; | ||
+ | 6029 FD59 84 C4 STY MEMUSS+1 | ||
+ | 6030 FD5B A0 1F LDY #$1F ;# of bytes to move | ||
+ | 6031 | ||
+ | 6032 | ||
+ | 6033 FD5D B9 B6 02 LDA IRQVP, | ||
+ | 6034 FD60 B0 02 BCS VECSK ; | ||
+ | 6035 | ||
+ | 6036 FD62 B1 C3 LDA (MEMUSS), | ||
+ | 6037 | ||
+ | 6038 | ||
+ | 6039 FD64 91 C3 STA (MEMUSS), | ||
+ | 6040 FD66 99 B6 02 STA IRQVP, | ||
+ | 6041 FD69 88 DEY ;go to next one | ||
+ | 6042 FD6A 10 F1 BPL VECLOOP ; | ||
+ | 6043 FD6C 60 RTS | ||
+ | 6044 | ||
+ | 6045 | ||
+ | 6046 | ||
+ | 6047 | ||
+ | 6048 | ||
+ | 6053 FD6D BFEAD2FEADFE .dw IRQVEC, WARMST, LNKNMI, IOPEN | ||
+ | 6053 FD73 0AF4 | ||
+ | 6054 FD75 4AF3C7F209F3 .dw ICLOSE, ICHKIN, ICHKOT, ICLRCH | ||
+ | 6054 FD7B F3F3 | ||
+ | 6055 FD7D 0EF27AF270F7 .dw ICHRIN, ICHROT, ISTOP, | ||
+ | 6055 FD83 F5F1 | ||
+ | 6056 FD85 EFF3D2FE49F5 .dw ICLALL, WARMST, LNKLOD, LNKSAV | ||
+ | 6056 FD8B 85F6 | ||
+ | |||
+ | Routine IOINIT | ||
+ | This routine initializes the VIAs. Lots of bit twiddling goes on | ||
+ | here to set-up the various ports. This routine also starts the system | ||
+ | IRQ timer. | ||
+ | |||
+ | 6138 | ||
+ | 6139 | ||
+ | 6140 | ||
+ | 6141 | ||
+ | 6142 FDF9 A9 7F LDA # | ||
+ | 6143 FDFB 8D 1E 91 STA D1IER ; | ||
+ | 6144 FDFE 8D 2E 91 STA D2IER ; | ||
+ | |||
+ | 6145 FE01 A9 40 LDA # | ||
+ | ; | ||
+ | 6146 FE03 8D 2B 91 STA D2ACR ;VIA2 aux ctrl reg | ||
+ | |||
+ | 6147 FE06 A9 40 LDA # | ||
+ | ; RS-232 timing | ||
+ | 6148 FE08 8D 1B 91 STA D1ACR ;VIA1 aux ctrl reg | ||
+ | |||
+ | 6149 FE0B A9 FE LDA # | ||
+ | ; CA2/CB2 manual H, CB1 pos | ||
+ | ; trig. CA1 negative trig. | ||
+ | ; | ||
+ | ; | ||
+ | ; | ||
+ | ; | ||
+ | 6150 FE0D 8D 1C 91 STA D1PCR ;VIA1 periph ctrl reg | ||
+ | |||
+ | 6151 FE10 A9 DE LDA # | ||
+ | ; CB2 manual L, CB1 pos | ||
+ | ; trig. CA1 negative trig. | ||
+ | ; CA2 manual H | ||
+ | ; | ||
+ | ; | ||
+ | ; | ||
+ | ; | ||
+ | 6152 FE12 8D 2C 91 STA D2PCR ;VIA2 periph ctrl reg | ||
+ | |||
+ | 6153 FE15 A2 00 LDX #$00 ;DDR all bits IN | ||
+ | 6154 FE17 8E 12 91 STX D1DDRB ; | ||
+ | |||
+ | 6155 FE1A A2 FF LDX # | ||
+ | 6156 FE1C 8E 22 91 STX D2DDRB ; | ||
+ | |||
+ | 6157 FE1F A2 00 LDX #$00 ;DDR all bits IN | ||
+ | 6158 FE21 8E 23 91 STX D2DDRA ; | ||
+ | |||
+ | 6159 FE24 A2 80 LDX # | ||
+ | 6160 FE26 8E 13 91 STX D1DDRA ; | ||
+ | |||
+ | 6161 FE29 A2 00 LDX #$00 | ||
+ | 6162 FE2B 8E 1F 91 STX D1ORAH ; | ||
+ | |||
+ | 6163 FE2E 20 84 EF JSR SCLK1 ;set IEEE clock line=1 | ||
+ | |||
+ | 6164 FE31 A9 82 LDA # | ||
+ | 6165 FE33 8D 1E 91 STA D1IER ;VIA1 IER BIT7=1 | ||
+ | |||
+ | 6166 FE36 20 8D EF JSR SCLK0 ;set IEEE clock line=0 | ||
+ | 6167 | ||
+ | 6168 | ||
+ | 6169 | ||
+ | 6170 | ||
+ | 6171 FE39 A9 C0 LDA # | ||
+ | 6172 FE3B 8D 2E 91 STA D2IER ;VIA2 IER BIT7-6=1 | ||
+ | |||
+ | 6173 FE3E A9 89 LDA # | ||
+ | 6174 FE40 8D 24 91 STA D2TM1L ; | ||
+ | |||
+ | 6175 FE43 A9 42 LDA # | ||
+ | 6176 FE45 8D 25 91 STA D2TM1L+1 ; | ||
+ | 6177 FE48 60 RTS | ||
+ | |||
+ | Routine CINT1 | ||
+ | This final routine initializes the character generator, sets the | ||
+ | initial screen colors, clears the screen and " | ||
+ | updates screen and cursor pointers. | ||
+ | |||
+ | 1349 | ||
+ | 1350 | ||
+ | 1351 | ||
+ | 1352 | ||
+ | 1353 | ||
+ | 1354 | ||
+ | 1355 | ||
+ | 1356 | ||
+ | 1357 E518 20 BB E5 JSR IODEF1 ;set deflt I/O and init VIC | ||
+ | 1358 E51B AD 88 02 LDA HIPAGE ;get screen memory page | ||
+ | 1359 E51E 29 FD AND # | ||
+ | 1360 E520 0A ASL A ;LS nibble is ChrRAM | ||
+ | 1361 E521 0A ASL A | ||
+ | 1362 E522 09 80 ORA # | ||
+ | 1363 E524 8D 05 90 STA VRSTRT ;set chargen ROM to $8000 | ||
+ | 1364 E527 AD 88 02 LDA HIPAGE ;get screen mem page | ||
+ | 1365 E52A 29 02 AND # | ||
+ | 1366 E52C F0 08 BEQ CINT1A ; | ||
+ | 1367 | ||
+ | 1368 E52E A9 80 LDA # | ||
+ | 1369 E530 0D 02 90 ORA VRCOLS ;set Bit7 | ||
+ | 1370 E533 8D 02 90 STA VRCOLS | ||
+ | 1371 | ||
+ | 1372 | ||
+ | 1373 E536 A9 00 LDA #$00 | ||
+ | 1374 E538 8D 91 02 STA SHMODE ; | ||
+ | 1375 E53B 85 CF STA BLNON ; | ||
+ | 1376 | ||
+ | 1377 E53D A9 EA LDA #$EA ; | ||
+ | 1378 E53F 8D 8F 02 STA FCEVAL | ||
+ | 1379 E542 A9 EA LDA #$EA ; | ||
+ | 1380 E544 8D 90 02 STA FCEVAL+1 ; | ||
+ | 1381 | ||
+ | 1382 E547 A9 0A LDA #$0A | ||
+ | 1383 E549 8D 89 02 STA KBMAXL ;key buffer=16 | ||
+ | 1384 E54C 8D 8C 02 STA KRPTDL ; | ||
+ | 1385 E54F A9 06 LDA #$06 | ||
+ | 1386 E551 8D 86 02 STA CLCODE ; | ||
+ | 1387 E554 A9 04 LDA #$04 | ||
+ | 1388 E556 8D 8B 02 STA KRPTSP ; | ||
+ | 1389 E559 A9 0C LDA #$0C | ||
+ | 1390 E55B 85 CD STA BLNCT ; | ||
+ | 1391 E55D 85 CC STA BLNSW ;set for solid cursor | ||
+ | 1392 | ||
+ | 1393 | ||
+ | 1394 | ||
+ | 1395 | ||
+ | 1396 E55F AD 88 02 LDA HIPAGE ;mem page for screen RAM | ||
+ | 1397 E562 09 80 ORA # | ||
+ | 1398 E564 A8 TAY | ||
+ | 1399 E565 A9 00 LDA #$00 | ||
+ | 1400 E567 AA TAX | ||
+ | 1401 | ||
+ | 1402 | ||
+ | 1403 E568 94 D9 STY SLLTBL, | ||
+ | 1404 E56A 18 CLC | ||
+ | 1405 E56B 69 16 ADC #$16 ;add 22 | ||
+ | 1406 E56D 90 01 BCC CLRSC1 | ||
+ | 1407 | ||
+ | 1408 E56F C8 INY | ||
+ | 1409 | ||
+ | 1410 | ||
+ | 1411 E570 E8 INX | ||
+ | 1412 E571 E0 18 CPX #$18 ;all rows done? | ||
+ | 1413 E573 D0 F3 BNE CLRLP1 | ||
+ | 1414 | ||
+ | 1415 E575 A9 FF LDA #$FF | ||
+ | 1416 E577 95 D9 STA SLLTBL,X | ||
+ | 1417 E579 A2 16 LDX #$16 | ||
+ | 1418 | ||
+ | 1419 | ||
+ | 1420 E57B 20 8D EA JSR CLRLIN ; | ||
+ | 1421 E57E CA DEX | ||
+ | 1422 E57F 10 FA BPL CLRLP2 | ||
+ | 1423 | ||
+ | 1424 | ||
+ | 1425 | ||
+ | 1426 | ||
+ | 1427 E581 A0 00 LDY #$00 | ||
+ | 1428 E583 84 D3 STY CSRIDX ;set column to 0 | ||
+ | 1429 E585 84 D6 STY CURROW ;and row to 0, too | ||
+ | 1430 | ||
+ | 1431 | ||
+ | 1432 | ||
+ | 1433 | ||
+ | 1434 E587 A6 D6 LDX CURROW | ||
+ | 1435 E589 A5 D3 LDA CSRIDX | ||
+ | 1436 | ||
+ | 1437 | ||
+ | 1438 E58B B4 D9 LDY SLLTBL,X | ||
+ | 1439 E58D 30 08 BMI SCNPT1 | ||
+ | 1440 | ||
+ | 1441 E58F 18 CLC | ||
+ | 1442 E590 69 16 ADC #$16 | ||
+ | 1443 E592 85 D3 STA CSRIDX | ||
+ | 1444 E594 CA DEX | ||
+ | 1445 E595 10 F4 BPL SCNPLP | ||
+ | 1446 | ||
+ | 1447 | ||
+ | 1448 E597 B5 D9 LDA SLLTBL,X | ||
+ | 1449 E599 29 03 AND #$03 | ||
+ | 1450 E59B 0D 88 02 ORA HIPAGE | ||
+ | 1451 E59E 85 D2 STA LINPTR+1 | ||
+ | 1452 E5A0 BD FD ED LDA LBSCAD,X | ||
+ | 1453 E5A3 85 D1 STA LINPTR | ||
+ | 1454 E5A5 A9 15 LDA #$15 | ||
+ | 1455 E5A7 E8 INX | ||
+ | 1456 | ||
+ | 1457 | ||
+ | 1458 E5A8 B4 D9 LDY SLLTBL,X | ||
+ | 1459 E5AA 30 06 BMI SCNEXIT | ||
+ | 1460 | ||
+ | 1461 E5AC 18 CLC | ||
+ | 1462 E5AD 69 16 ADC #$16 | ||
+ | 1463 E5AF E8 INX | ||
+ | 1464 E5B0 10 F6 BPL SCNLP1 | ||
+ | 1465 | ||
+ | 1466 | ||
+ | 1467 E5B2 85 D5 STA LINLEN | ||
+ | 1468 E5B4 60 RTS ; return to init routine | ||
+ | |||
+ | CINT1 calls one routine, IODEF1, which resets the default input | ||
+ | and output devices to the keyboard and screen, respectively, | ||
+ | resets the VIC chip's registers to their default. The one interesting | ||
+ | thing to note is the PANIC entry at $E5B5. The Kernal does not call the | ||
+ | PANIC entry! The Kernal bypasses the extra JSR by calling the IODEF1 code | ||
+ | directly. Not that there is any magic in the PANIC entry; it only calls | ||
+ | the IODEF1 code and homes the cursor. | ||
+ | BASIC ROM, so it is possible that BASIC calls or vectors the PANIC entry. | ||
+ | |||
+ | 1404 | ||
+ | 1405 | ||
+ | 1406 | ||
+ | 1407 | ||
+ | 1408 E5B5 20 BB E5 JSR IODEF1 ; | ||
+ | 1409 E5B8 4C 81 E5 JMP HOME ;home cursor | ||
+ | 1410 | ||
+ | 1411 | ||
+ | 1412 | ||
+ | 1413 | ||
+ | 1414 E5BB A9 03 LDA #$03 | ||
+ | 1415 E5BD 85 9A STA OUTDEV ; | ||
+ | 1416 E5BF A9 00 LDA #$00 | ||
+ | 1417 E5C1 85 99 STA INDEV ; | ||
+ | 1418 | ||
+ | 1419 | ||
+ | 1420 | ||
+ | 1421 E5C3 A2 10 LDX #$10 ;move 16 VIC registers | ||
+ | 1422 | ||
+ | 1423 | ||
+ | 1424 | ||
+ | 1425 E5C5 BD E3 ED LDA VICSUP-1, | ||
+ | 1426 E5C8 9D FF 8F STA $8FFF, | ||
+ | 1427 E5CB CA DEX ; | ||
+ | 1428 E5CC D0 F7 BNE V6561I ;do next register | ||
+ | 1429 | ||
+ | 1430 E5CE 60 RTS | ||
+ | |||
+ | 2789 | ||
+ | 2790 | ||
+ | 2791 | ||
+ | 2792 | ||
+ | 2793 | ||
+ | .db $19 ; | ||
+ | .db $16 ; | ||
+ | .db $2E ; | ||
+ | .db $00 ; | ||
+ | .db $C0 ;bit0-3 start of char memory | ||
+ | ;bit4-7 is rest of video address | ||
+ | ;BITS 3,2,1,0 CM starting address | ||
+ | ; | ||
+ | ;0000 | ||
+ | ;0001 8400 33792 | ||
+ | ;0010 8800 34816 | ||
+ | ;0011 8C00 35840 | ||
+ | ;1000 | ||
+ | ;1001 xxxx } | ||
+ | ;1010 xxxx }unavail. | ||
+ | ;1011 xxxx } | ||
+ | ;1100 1000 4096 | ||
+ | ;1101 1400 5120 | ||
+ | ;1110 1800 6144 | ||
+ | ;1111 1C00 7168 | ||
+ | .db $00 ;Hpos of light pen | ||
+ | .db $00 ;Vpos of light pen | ||
+ | 2794 | ||
+ | .db $00 ;Digitized value of paddle Y | ||
+ | .db $00 ;Frequency for oscillator 1 (low) | ||
+ | .db $00 ;Frequency for oscillator 2 (medium) | ||
+ | .db $00 ;Frequency for oscillator 3 (high) | ||
+ | .db $00 ;Frequency of noise source | ||
+ | .db $00 ; bit0-3 sets volume of all sound | ||
+ | ; bit4-7 are auxiliary color information | ||
+ | .db $1B ;Screen and border color register | ||
+ | ; bits 4-7 select background color | ||
+ | ; bits 0-2 select border color | ||
+ | ; bit 3 selects inverted or normal mode | ||
+ | |||
+ | Conclusion | ||
+ | Next time, we'll examine more routines and answer any questions | ||
+ | that you may have. | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | NTSC-PAL fixing, part 1 | ||
+ | =============== | ||
+ | Russel Reed < | ||
+ | |||
+ | Introduction | ||
+ | ============ | ||
+ | |||
+ | Just about everyone knows that there are differences between | ||
+ | NTSC and PAL machines. | ||
+ | few of the technical issues, such as extra raster lines, and know that | ||
+ | these differences can cause certain types of programs to fail, in particular | ||
+ | games and demos. | ||
+ | these programs? | ||
+ | concerned. | ||
+ | " | ||
+ | in order to enjoy programs written in another country. | ||
+ | game companies imported software from other countries and sometimes had | ||
+ | to reprogram the software so it would run correctly. | ||
+ | always a simple task, but there are some techniques and strategies which | ||
+ | are useful in most cases. | ||
+ | problems overall. | ||
+ | |||
+ | We decided to approach this problem in a novel way: a pair of | ||
+ | yay-hoos (Robin and Steve) would place themselves under the tutelage | ||
+ | of an experienced fixer (Russ, a.k.a. Decomp/ | ||
+ | and write up the results and experiences. | ||
+ | on which program (or programs) to fix. It had to first of all be | ||
+ | fixable! | ||
+ | complicated project. | ||
+ | feet wet, the actual fixing job needed to be fairly straightforward (so | ||
+ | the big custom track-loading demo will have to wait for a future article). | ||
+ | But demos are the natural fixing candidate, and after viewing | ||
+ | several different ones, and examining the code to see what would need | ||
+ | fixing, we finally decided on the demo "Slow Ideas", | ||
+ | of crazy Finns way back in 1989. It's a cool demo, was challenging and | ||
+ | fun to fix, and turned out to be pedagogically a good choice. | ||
+ | two pages, and each page had a number of effects in need of fixing. | ||
+ | These pages, and the practical side of fixing, will be discussed in | ||
+ | some detail below, but first we need to review the differences between | ||
+ | PAL and NTSC, and discuss the implications of those differences, | ||
+ | how they manifest themselves in programs. | ||
+ | |||
+ | |||
+ | NTSC/PAL Differences | ||
+ | ==================== | ||
+ | |||
+ | There is really just one primary difference between NTSC and PAL | ||
+ | machines: the graphics, which means VIC. But since VIC generates the | ||
+ | machine clock cycles, that means the computers run at different speeds. | ||
+ | And since they run at different speeds, the CIA timers run at different | ||
+ | speeds, the SIDs run at different speeds, and the CPUs run at different | ||
+ | speeds. | ||
+ | nearly every aspect of the machine' | ||
+ | |||
+ | VIC and graphics | ||
+ | ---------------- | ||
+ | |||
+ | The PAL television standard is different from the NTSC standard. | ||
+ | The primary difference is actually the way color is encoded, but | ||
+ | the main issue for the 64 is the frame rate, the number of raster | ||
+ | lines, and the number of cycles per raster line: | ||
+ | |||
+ | VIC chip Frame Rate Raster lines Cycles per line | ||
+ | 6567R56A (NTSC): | ||
+ | 6567R8+ | ||
+ | 6569 (PAL) : 50Hz 312 63 | ||
+ | |||
+ | As the video raster beam sweeps across the television tube, VIC tells | ||
+ | it what to display, one pixel at a time. The CPU clock is exactly | ||
+ | 1/8th of the "pixel clock", | ||
+ | eight pixels on the screen. | ||
+ | that a PAL machine has 63*8 = 504 pixels per raster line. 320 of | ||
+ | those pixels comprise the visible display, while the other 184 make | ||
+ | up the left and right borders. | ||
+ | |||
+ | What is important here are the three numbers in the table, though. | ||
+ | First consider the number of cycles per line. If a program is exactly | ||
+ | synchronized with the raster, then it can make precise changes to the | ||
+ | screen merely by letting a certain number of cycles elapse. | ||
+ | example is making raster bars; a more involved example is opening the side | ||
+ | borders, or generating an FLI display (which you can read about in previous | ||
+ | issues of C=Hacking). | ||
+ | timings will fail when run on a machine with a different number of cycles | ||
+ | per line. (Note that on older computers, like the old Atari 2600, the CPU | ||
+ | actually built the screen display, and so _all_ the screen code had to | ||
+ | be exactly timed). | ||
+ | Next observe the different number of raster lines. | ||
+ | display begins on raster line 50, and there are 200 visible raster lines | ||
+ | (320x200, you know). | ||
+ | 62 raster lines for the PAL border. | ||
+ | waits for a raster line greater than 263 will have problems on an NTSC | ||
+ | machine. | ||
+ | |||
+ | LDA #$10 ;Wait for line 266 | ||
+ | CMP $D012 | ||
+ | BNE *-5 | ||
+ | LDA $D011 | ||
+ | BPL *-12 | ||
+ | |||
+ | will never exit, while a raster IRQ will never occur. | ||
+ | adjusted to use a different raster line or another method of timing. | ||
+ | Sprite graphics on lines greater than 263 will wrap around on an NTSC | ||
+ | screen and in some cases the image will be displayed twice. | ||
+ | must either be moved or their images truncated to correct this. | ||
+ | Moreover, the extra raster lines mean extra cycles per PAL frame. | ||
+ | Using the table above, an NTSC machine has 263*65 = 17095 cycles per frame, | ||
+ | and a PAL machine has 312*63 = 19656 cycles per frame. | ||
+ | games those extra 2500 cycles are used to perform needed calculations and | ||
+ | operations before the next frame begins, which leads to all kinds of | ||
+ | trouble on an NTSC machine. | ||
+ | available in the _visible_ display, while PAL machines have many more | ||
+ | cycles available in the borders. | ||
+ | |||
+ | Finally, consider the frame rate. On an NTSC machine, there are | ||
+ | 17095 cycles per frame, and 60 frames per second, giving 17095*60 = 1025700 | ||
+ | cycles per second, or 1.02 MHz. On a PAL machine, there are 19656*50 = 982800 | ||
+ | cycles per second, or 0.98 MHz. So although a PAL machine has more cycles | ||
+ | per frame, the CPU runs slightly slower than on an NTSC machine. | ||
+ | game like Elite, which involves raw computation, | ||
+ | on an NTSC machine. | ||
+ | frame which is important -- as long as all game calculations can get done | ||
+ | before the next frame, the game can run at the full frame rate. Also | ||
+ | note that most tunes are synced to the screen, so when a PAL tune, | ||
+ | designed to play at 50 calls per second, is suddenly called 60 times | ||
+ | each second, it will play noticeably faster. | ||
+ | By the way, there actually aren't _exactly_ 50 or 60 frames per | ||
+ | second. | ||
+ | clock rate, not the other way around! | ||
+ | 14318181 / 14 = 1022727Hz for NTSC and 17734472 / 18 = 985248Hz for PAL. | ||
+ | Dividing by 17095 (PAL=19656) cycles per frame gives 59.826 frames/ | ||
+ | NTSC and 50.124 fps PAL. | ||
+ | The important thing to remember here is that PAL machines run | ||
+ | slightly slower than NTSC machines, but have many more cycles per video | ||
+ | frame. | ||
+ | |||
+ | Just as a side node, the above calculation should indicate to you | ||
+ | that although the AC electricity lines are 60Hz in the US and 50Hz in Europe, | ||
+ | that has nothing to do with the 50/60Hz PAL/NTSC frame rates. | ||
+ | an NTSC monitor easily display a 50Hz signal (let alone 59.826Hz), but the | ||
+ | actual power frequency fluctuates around that 50/60Hz anyways, so that | ||
+ | the AC line frequency is only 50/60Hz on _average_ -- good enough to | ||
+ | run a clock, but not nearly precise enough to generate a video signal. | ||
+ | |||
+ | Also note that there are two different NTSC VIC chips in the table. | ||
+ | The 64 cycles/line VIC was present in the earliest 64s shipped. | ||
+ | actually a bug in the chip, though, and 65 cycles/line is the " | ||
+ | not to mention most common, NTSC VIC chip, and the one which we will | ||
+ | refer to in this article. | ||
+ | March 1985 IEEE Spectrum article: | ||
+ | |||
+ | In addition to the difficuly with the ROM [sparklies], | ||
+ | a logic error," | ||
+ | corrected sometime after Charpentier left Commodore, caused | ||
+ | the early C-64s to generate the wrong number of clock cycles | ||
+ | on each horizontal video line. "It was off by one," he said. | ||
+ | " | ||
+ | |||
+ | As a result, the 180-degree phase shift between the black-and-white | ||
+ | and color information, | ||
+ | problems, didn't occur. | ||
+ | the background, the edges of some objects on the screen would appear | ||
+ | slightly out of line... | ||
+ | |||
+ | There ya go! | ||
+ | |||
+ | Machine cycles | ||
+ | -------------- | ||
+ | |||
+ | The tiny CPU speed difference has a number of important ramifications. | ||
+ | It means that the CIA timers on a PAL machine run slightly slower than on | ||
+ | an NTSC machine, so timer values may have to be recalibrated; | ||
+ | system CIA interrupt has a different setting for PAL and NTSC. Moreover, | ||
+ | a disk drive runs at exactly 1MHz -- there are no "PAL disk drives" | ||
+ | means cycle-exact fastloaders will not synchronize correctly on | ||
+ | different-speed machines. | ||
+ | Not only will the tempo of any interrupt-based tune (raster _or_ CIA) | ||
+ | change, but the actual pitch will change as well. SID generates its waveforms | ||
+ | by simply updating an internal counter every cycle, so an NTSC SID is | ||
+ | essentially playing a digital sample at 1.02 MHz. When that sample is | ||
+ | played at 0.98 MHz, it's like slowing a record player down a little -- the | ||
+ | pitch decreases. | ||
+ | _relative_ pitch between notes does not; the tune plays the same, but | ||
+ | the pitches are all a little over a quarter-step lower. | ||
+ | Practically speaking, this is totally irrelevant to fixing. | ||
+ | Only the tune speed is of significant interest. | ||
+ | |||
+ | Fixing the Problems | ||
+ | =================== | ||
+ | |||
+ | The previous section described different classes of problems | ||
+ | which can occur from the different cycles per line, lines per frame, | ||
+ | cycles per frame, and cycles per second. | ||
+ | syncs, bad interrupts and infinite busy-loops, too many cycles per frame, | ||
+ | mis-timed timers and fastloaders, | ||
+ | |||
+ | Fixing tunes is easy. For programs in which the interrupt frequency | ||
+ | is otherwise unimportant, | ||
+ | the correct frequency. | ||
+ | cannot be changed, the music speed may be adjusted by calling the play | ||
+ | routine twice on every sixth interrupt or not calling it on the sixth | ||
+ | interrupt. | ||
+ | the music routine rewritten to work at the different frequency. | ||
+ | the time none of these are necessary, as the music sounds fine at the | ||
+ | different speed anyway. | ||
+ | |||
+ | Next consider too many cycles per frame. | ||
+ | class of problems, we must improve the efficiency of the code. There are | ||
+ | three cases here. Sometimes busy waits are used to synchronize the code | ||
+ | with a raster line on the screen. | ||
+ | restrictions which result in additional wasted cycles. | ||
+ | may sometimes be replaced by raster interrupts which make better use of | ||
+ | the available cycles. | ||
+ | At other times, programmers will set up all graphic updates so that | ||
+ | the code executes in the vertical borders. | ||
+ | disadvantage here, as we've seen earlier. | ||
+ | rearranged to take advantage of the available cycles during the screen | ||
+ | display. | ||
+ | at the same time it is being displayed, but by splitting updates up it can | ||
+ | usually be pulled off. | ||
+ | Finally there are some cases in which neither of these techniques | ||
+ | is of use. For a perfect fix, the code must be optimized, often by | ||
+ | sacrificing memory. | ||
+ | effects can be truncated or updates slowed down so that less is updated | ||
+ | each frame. | ||
+ | |||
+ | Next consider the different cycles per raster line. In most | ||
+ | cases, this difference is not significant. | ||
+ | significant are raster routines, where synchronization with the video | ||
+ | chip is established by using exactly the right number of processor | ||
+ | cycles. | ||
+ | will have flicker and look crooked; VSP effects will often be shifted | ||
+ | the wrong amount; FLI routines usually either repeat the top row of | ||
+ | graphics all the way down the screen or else don't display at all. | ||
+ | These problems can all be corrected by adding or subtracting the correct | ||
+ | number of cycles per raster line. In most cases, this class of problem is | ||
+ | actually the easiest to fix. | ||
+ | There are several approaches to putting the right number of cycles | ||
+ | in for the fix. If the source is available, inserting a NOP instruction | ||
+ | may be all that is required for an NTSC fix. Without the source, a | ||
+ | modified routine may be inserted into some empty memory and the original | ||
+ | routine bypassed. | ||
+ | insert a NOP with a machine language monitor. | ||
+ | the opcodes used to use a different number of cycles. | ||
+ | delays that can be added with just two bytes: | ||
+ | |||
+ | CMP #$EA ;2 cycles | ||
+ | BIT $EA ;3 cycles | ||
+ | NOP NOP ;4 cycles | ||
+ | INC $EA ;5 cycles | ||
+ | CMP ($EA,X) ;6 cycles | ||
+ | |||
+ | This gives lots of room to work with, assuming the flags aren't important. | ||
+ | If .X or .Y is unused, then | ||
+ | |||
+ | STA $1234 | ||
+ | |||
+ | can be changed to | ||
+ | |||
+ | STA $1234,X or STA $1234, | ||
+ | |||
+ | to gain an extra cycle (if .Y=0). | ||
+ | value like $FF, the target can be adjusted so that STA $1234 becomes | ||
+ | STA $1135, | ||
+ | and STA $D016 together. | ||
+ | to add the two extra cycles needed for an NTSC fix. | ||
+ | Also note that when sprites are active, a different amount | ||
+ | of cycles may get stolen depending on which instruction is executing | ||
+ | when the sprite data is read. C=Hacking #3 discusses this in detail, | ||
+ | and explains how it may be used to synchronize code with the raster beam. | ||
+ | From a fixing standpoint, it means that you don't always want to add | ||
+ | two instruction cycles to convert a piece of PAL code to NTSC; sometimes, | ||
+ | as in the demo below, only _one_ cycle must be added to fix certain routines, | ||
+ | with the other cycle being eaten by VIC. | ||
+ | |||
+ | Sprites have a few differences between PAL and NTSC as well; like | ||
+ | the other differences they are not evident in many programs. | ||
+ | sprite positions start at zero on the left side of the screen and | ||
+ | increase as you move to the right. | ||
+ | positions wrap back around to the left side of the screen. | ||
+ | high horizontal positions, the sprites are 8 pixels farther right on a | ||
+ | PAL screen than they would be on an NTSC screen. | ||
+ | positions are used to create sprites that extend all the way to the left | ||
+ | edge of the visible screen, problems show up. A PAL routine will have | ||
+ | an eight pixel wide gap in the sprites on an NTSC machine while an NTSC | ||
+ | routine will have eight pixels of overlap on a PAL machine. | ||
+ | positions are stored in a table or loaded into a register with immediate | ||
+ | addressing. | ||
+ | the appropriate direction. | ||
+ | behavior, which is seen even less often. | ||
+ | |||
+ | As was stated earlier, the 65xx processors in the 64 and 1541 run | ||
+ | at close to the same frequency, but the ratio of the 64's processor | ||
+ | frequency to the 1541's processor frequency is not exactly 1.0, and the | ||
+ | ratio is different on PAL and NTSC systems. | ||
+ | 6510 in the 64 and the 6502 in the 1541 may be synchronized at the start of | ||
+ | a section of code, after executing the same number of cycles they will no | ||
+ | longer be synchronized. | ||
+ | processor to bring them back into sync, but this adjustment varies | ||
+ | depending on either a PAL or NTSC system. | ||
+ | fastloaders, | ||
+ | in order to transmit 8 bits one at a time or four pairs of two bits | ||
+ | without the need for handshaking. | ||
+ | need to be added or subtracted for a fix. Finding the right place to | ||
+ | insert or remove these cycles can be challenging. | ||
+ | remove cycles from the 64 routine, which may be impossible, you can instead | ||
+ | e.g. add cycles to the complementary drive routine. | ||
+ | |||
+ | Finally, understanding what problems may arise and what should be | ||
+ | done to correct them is only part of the skillset needed by a good fixer. | ||
+ | In the best case, you'll be working with source code which you've written | ||
+ | and understand. | ||
+ | in which case you must study and understand it before you can start | ||
+ | trying to fix any NTSC/PAL problems. | ||
+ | available, leaving you to work in a machine language monitor. | ||
+ | imposes some new restraints. | ||
+ | blocks of code around. | ||
+ | this. Code can easily be left out by replacing it with NOP instructions | ||
+ | or inserting JMPs to branch around sections which aren't needed. | ||
+ | code can be patched in with a JMP instruction to the new code and another | ||
+ | JMP instruction at its end to return to the old routines. | ||
+ | need to be added, opcodes may be changed; for example, a LDA $ABCD might | ||
+ | be replaced with LDA $ABCD,Y to add an extra cycle. | ||
+ | simply have to resort to a symbolic disassembly (i.e. disassemble into | ||
+ | source code). | ||
+ | |||
+ | Then there are the cases where the object code isn't even readily | ||
+ | accessible. | ||
+ | protection are valuable. | ||
+ | compressed to make the files shorter, adding a single BASIC SYS command | ||
+ | line as well. These routines rarely try to be deceptive and are fairly | ||
+ | short, so they can be undone, often just by modifying the existing | ||
+ | decompression code a bit. It is common practice to have a sequence | ||
+ | cruncher on top of an RLE (or equal-byte) packer or linker. | ||
+ | " | ||
+ | will often have disk or tape copy protection and purposefully obtuse code. | ||
+ | |||
+ | Just remember that fixing is often about creativity and diligence. | ||
+ | There is a common set of problems encountered and many creative solutions | ||
+ | to correct them. With practice, it becomes easier, just like anything else. | ||
+ | You'll have an easier time of it if you pick your battles carefully. | ||
+ | Remember that some problems just can't be fixed perfectly. | ||
+ | stuck, try a different challenge and perhaps come back to the problem | ||
+ | later. | ||
+ | |||
+ | With all this in mind, let's have a look at that demo! | ||
+ | |||
+ | |||
+ | Slow Ideas, page 1 | ||
+ | ---------- | ||
+ | |||
+ | Reconnaisance: | ||
+ | |||
+ | Page 1 is divided into basically three parts. | ||
+ | of the screen contains a stretching sprite tech-tech extending into | ||
+ | the borders, overlayed on top of raster bars. Then the rest of the | ||
+ | screen features a large Pu-239 picture/ | ||
+ | sprite scroll takes place in the lower portion of the screen, extending | ||
+ | into the lower (but not side) borders. | ||
+ | The screen is built using two interrupt routines, one located | ||
+ | at $1240 and the other located at $1280. | ||
+ | and occurs at the bottom of the screen (raster $F8 or so). It removes | ||
+ | the upper/lower borders, performs some calculations, | ||
+ | music. | ||
+ | at raster line $xx, at the top of the screen. | ||
+ | sideborder display, and performs most of the calculations. | ||
+ | are vectored through $0314, not $FFFE. | ||
+ | Fortunately, | ||
+ | that is, there is a lot of empty memory between routines -- probably it was | ||
+ | coded in an ML monitor. | ||
+ | routines around, is much easier. | ||
+ | Pasi was, to realize that it would need to be fixed by C=Hacking one | ||
+ | day. | ||
+ | |||
+ | When run, the demo is a mess. The most prominent defects are | ||
+ | that the music plays very slowly, the screen flashes, with the main | ||
+ | picture flickering between the top of the screen and the lower portion | ||
+ | of the screen, the side borders are not open in the tech-tech, and | ||
+ | sometimes the scrolling sprites are off the screen. | ||
+ | |||
+ | Too many cycles: | ||
+ | |||
+ | The half-speed music and flickering screen indicates that interrupts | ||
+ | are getting skipped, which points a finger straight at too many cycles -- if | ||
+ | the next interrupt is set *after* it is supposed to occur, then a whole | ||
+ | frame will of course pass by before it actually occurs. | ||
+ | The first question is, where are the cycles getting eaten? | ||
+ | of the two interrupt routines, which is using too many cycles. | ||
+ | is immediately obvious: any interrupts that take place totally on the screen | ||
+ | have *extra* cycles available -- 2 cycles per raster line. The loss of cycles | ||
+ | comes in the lower border, which means the $1240 interrupt: | ||
+ | |||
+ | jsr tune | ||
+ | jsr blah1 | ||
+ | jsr blah2 | ||
+ | LDA $D012 | ||
+ | CMP #$1E | ||
+ | BCC *-7 | ||
+ | |||
+ | First note that curious $D012 code. It surely is meant to compare with | ||
+ | line $011E, not line $1E. Line $011E doesn' | ||
+ | though, so it actually waits until line $1E, well past the $1280 interrupt. | ||
+ | So the first order of business is to BIT that BCC out of existence. | ||
+ | Alas, this still does not fix up the flickering. | ||
+ | is to BIT ($2C) out the JSR tune call, to see if the problem really is | ||
+ | cycles. | ||
+ | at least mostly stable, screen. | ||
+ | but keeping the music, gives an unstable screen; the tune simply has to | ||
+ | go somewhere else. | ||
+ | Finding extra cycles takes a bit of work; for now, it is enough to | ||
+ | $2C-BIT out the JSR TUNE and focus on the other problems. | ||
+ | |||
+ | Tech-tech: | ||
+ | |||
+ | Let's have a look at the tech-tech routine: | ||
+ | |||
+ | $1280 LDA #$96 | ||
+ | STA $DD00 | ||
+ | LDY #$FF | ||
+ | BIT $EA | ||
+ | NOP | ||
+ | NOP | ||
+ | LDX #$09 | ||
+ | DEX | ||
+ | BNE *-3 | ||
+ | $1290 LDX #$5F | ||
+ | $1292 LDA $1000,X | ||
+ | STA $D018 | ||
+ | DEC $D016 ;Open border | ||
+ | STA $D021 | ||
+ | INC $D016 | ||
+ | LDA $1060,X | ||
+ | $12A4 STA $D017 | ||
+ | STY $D017 | ||
+ | LDA $1100,X | ||
+ | STA $D011 | ||
+ | DEX | ||
+ | $12B1 LDA $1000,X | ||
+ | STA $D018 | ||
+ | DEC $D016 ;Open border | ||
+ | STA $D021 | ||
+ | INC $D016 | ||
+ | LDA $1060,X | ||
+ | STA $D017 | ||
+ | STY $D017 | ||
+ | $12C9 BIT $EA | ||
+ | NOP | ||
+ | DEX | ||
+ | BPL $1292 | ||
+ | |||
+ | The routine has three parts: the initial delay at $1280, and a two-part | ||
+ | loop, the first part at $1292 and the second at $12B1. | ||
+ | between the two is the LDA $1100,X STA $D011 at $12AA; without two | ||
+ | parts to the loop, the branch would take too many cycles. | ||
+ | need fixing, since each part uses at least one raster. | ||
+ | Opening the side borders requires exact timing, yet the above | ||
+ | routine is entered through $0314, which will always have some cycle | ||
+ | variance. | ||
+ | himself wrote a nice article in C=Hacking #3 on using sprites to | ||
+ | synchronize the raster. | ||
+ | a specific instruction; | ||
+ | and you know exactly where you are on the raster line. | ||
+ | First to fix is the initial line delay. | ||
+ | NTSC cycles per raster line, it seems reasonable to first try adding | ||
+ | +2 cycles to the delay: | ||
+ | |||
+ | 1287 BIT $EA to AND ($00),Y ;+2 cycles | ||
+ | | ||
+ | | ||
+ | |||
+ | Really, CMP ($EA),Y is in general a better choice since it doesn' | ||
+ | affect .A, but it doesn' | ||
+ | besides. | ||
+ | soon enough (but it turns out they are enough). | ||
+ | Next up is the borders. | ||
+ | to add +2 cycles per raster line -- +2 cycles to both loop parts -- | ||
+ | it's not obvious where the raster sync is taking place, and +2 cycles | ||
+ | might cause the sync to fail. In fact, what's needed is just +1 cycle | ||
+ | per part: | ||
+ | |||
+ | 12A4 STA $D017 to STA $CF18,Y ;+1 cycle | ||
+ | |||
+ | 12C9 BIT $EA to INC $00EA ;+1 cycle | ||
+ | NOP | ||
+ | |||
+ | The $12A4 fix uses the fact that .Y=$FF. | ||
+ | the trick at $12C9 and be safer, but we used INC in our reckless youth. | ||
+ | And suddenly -- poof! The borders open up and the screen stops | ||
+ | flickering. | ||
+ | loop is split in two because of an STA $D011 instruction. | ||
+ | pushes badlines off, so that the timing stays precise; since badlines | ||
+ | never occur, the graphics data is never fetched. | ||
+ | rasterbars that VIC starts fetching graphics data; without the $D011 | ||
+ | push, this data will appear on the first visible rasterline (hence the | ||
+ | earlier flicker with too many cycles). | ||
+ | cycle timing) push takes place, the picture can get the jitters. | ||
+ | now we know why the demo behaved as it did, earlier. | ||
+ | |||
+ | Sprite scroll: | ||
+ | |||
+ | And yet... the screen still flickers, when sprites bounce down | ||
+ | too low. Clearly the sprites are eating into the cycles needed by the | ||
+ | lower border routines. | ||
+ | change the y-coordinates of all the sprites. | ||
+ | all the coordinate tables. | ||
+ | out which code stores the sprite coordinates, | ||
+ | amount from each of the y-coordinates. | ||
+ | to be located at $1590, and the simple insertion of code | ||
+ | |||
+ | 15AF STA $D001,Y to CLC | ||
+ | | ||
+ | STA $D001,Y | ||
+ | |||
+ | fixes things up just dandy. | ||
+ | |||
+ | Still too many cycles: | ||
+ | |||
+ | We still have the problem of what to do about the tune. Since | ||
+ | there aren't enough cycles in the $1240 interrupt, they need to be | ||
+ | found somewhere else, and the only somewhere else is the $1280 interrupt, | ||
+ | during the logo display. | ||
+ | lines are needed, and how many lines are free. This is easy enough, | ||
+ | by simply moving the JSR TUNE to the end of the $1280 routine, | ||
+ | sandwiched between an INC $D020 and a DEC $D020. | ||
+ | then indicate the end of the $1280 interrupt as well as the size | ||
+ | of the tune. | ||
+ | The good news is that $1280 has a fair amount of extra | ||
+ | cycles available. | ||
+ | and needs even more cycles. | ||
+ | lines. | ||
+ | cycles can be gained to make it all work. Note that if $1280 only | ||
+ | had a few raster lines to spare, this task would be much more difficult | ||
+ | (if not impossible). | ||
+ | |||
+ | Towards the end of the $1280 routine, there are a series of | ||
+ | subroutine calls. | ||
+ | one which copies values from a table at $1900 to a table at $1000, and | ||
+ | one which ORAs a value into the $1000 table. | ||
+ | and then ORAing, why not just combine the two loops? | ||
+ | |||
+ | 1E50 LDY $1FC6 | ||
+ | 1E53 LDX #$5C | ||
+ | 1E55 LDA $1900,Y | ||
+ | 1E58 STA $1001,Y | ||
+ | 1E5B DEY | ||
+ | 1E5C DEX | ||
+ | 1E5D BPL $1E55 LDY $1FC6 | ||
+ | 1E5F NOP | ||
+ | 1E60 LDA $1FC1 | ||
+ | 1E63 STA $1E70 | ||
+ | 1E66 LDA $1FC2 | ||
+ | 1E69 STA $1E6F | ||
+ | 1E6C LDX #$5F LDX #$5C | ||
+ | 1E6E LDA $32B6 | ||
+ | 1E71 ORA $1000, | ||
+ | 1E74 STA $1000, | ||
+ | 1E77 INC $1E6F | ||
+ | 1E7A BNE $1E8B | ||
+ | 1E7C INC $1E70 | ||
+ | 1E7F LDA $1E70 | ||
+ | 1E82 CMP #$34 | ||
+ | 1E84 BNE $1E8B | ||
+ | 1E86 LDA #$30 | ||
+ | 1E88 STA $1E70 | ||
+ | 1E8B DEX DEY | ||
+ | 1E8C BPL $1E6E DEX | ||
+ | 1E8E RTS BPL $1E6E | ||
+ | 1E8F BRK RTS | ||
+ | |||
+ | On the right are the patches we added, along with replacing the JSR $1E50 | ||
+ | with JSR $1E5D. | ||
+ | into $1000, it simply ORAs to $1900 and stores it in $1000 ($1001 actually, | ||
+ | since that's where the first loop stored stuff). | ||
+ | have noticed that the patch affects $1001-$105D, | ||
+ | affected $1000-$105F; | ||
+ | it does, what's your point? :). The ancient hacker technique applies | ||
+ | here: try it, and if it works, don't touch it and don't ask questions! | ||
+ | Better than a huge rewrite of self-modifying code. | ||
+ | |||
+ | Wrapping up: | ||
+ | |||
+ | With the above fix in place, and the tune moved to the $1280 | ||
+ | interrupt, the demo finally seems to work great. | ||
+ | to save it and crunch. | ||
+ | is easy enough, by looking at the disassembler and the initialization | ||
+ | code (which unpacks the code further). | ||
+ | lockup. | ||
+ | interrupts are not occuring. | ||
+ | is set, but $D011 is never set -- presumably the high bit is set, so the | ||
+ | interrupt occurs on a nonexistant raster line. Adding a simple | ||
+ | |||
+ | LDA #$1B | ||
+ | STA $D011 | ||
+ | |||
+ | to the initialization routine at $1443 fixes that up just fine, and | ||
+ | the program decrunches successfully. | ||
+ | one to go. | ||
+ | |||
+ | Slow Ideas, page 2 | ||
+ | ------------------ | ||
+ | |||
+ | Reconaissance: | ||
+ | |||
+ | Page 2 has essentially four visible parts: a tech-tech at the top of | ||
+ | the screen, followed by a swinging FALSTAFF sprite on top of rasters and | ||
+ | open borders, followed by the ubiquitous Pu-239 pic/logo, followed finally | ||
+ | by a sprite scroll on top of rasters with open side and bottom borders. | ||
+ | All this is done with two interrupts, one ($11FE) occuring at | ||
+ | raster line $31, the other ($1350) occuring at line $D1. The $31 raster | ||
+ | performs the tech-tech, the FALSTAFF rasters, and also performs some | ||
+ | calculations for various sprite scroll effects. | ||
+ | lower rasters and scrolling sprites, and also plays the music, scrolls the | ||
+ | sprites, and does the calculations for the FALSTAFF sprite. | ||
+ | When run, the screen is quite a jumble -- too many rasters -- and | ||
+ | needless to say, the screen effects need retiming. | ||
+ | better at 50Hz, so it needs to be retimed as well. | ||
+ | |||
+ | Top to bottom: | ||
+ | |||
+ | Before fixing the timing problems, the extra cycles need to be | ||
+ | addressed. | ||
+ | there are a series of subroutine calls at $13B3: | ||
+ | |||
+ | $13B3 JSR $211C ;Tune | ||
+ | JSR $0F80 ; | ||
+ | JSR $1300 ; | ||
+ | ; in $31 interrupt) | ||
+ | JSR $1D00 ; | ||
+ | |||
+ | Deducing the function of each routine is easy enough -- just BIT it out | ||
+ | and see what happens. | ||
+ | enough cycles to make the screen stabilize. | ||
+ | more work than fixing up timing -- especially three subroutines worth! -- | ||
+ | so it is enough to $2C-BIT out the three subroutine calls for now and fix | ||
+ | up the timing first. | ||
+ | |||
+ | Tech-tech: | ||
+ | |||
+ | At the top of the screen is a normal tech-tech, controlled by | ||
+ | the routine at $11FE. | ||
+ | |||
+ | $11FE Set up $D020/ | ||
+ | LDX #$08 | ||
+ | $1214 LDA $xxxx,Y | ||
+ | STA $D018 | ||
+ | LDA $xxxx,Y | ||
+ | STA $D016 | ||
+ | INY | ||
+ | DEX | ||
+ | NOP NOP NOP | ||
+ | Change VIC bank ($DD00) | ||
+ | CPX #$00 | ||
+ | BEQ $123F | ||
+ | NOP | ||
+ | NOP | ||
+ | NOP | ||
+ | ... | ||
+ | CPY #$2F | ||
+ | BCC $1214 | ||
+ | JMP $125E | ||
+ | |||
+ | The thing to recognize here is that there are two loops -- on every eighth | ||
+ | line the BEQ $123F branch is taken. | ||
+ | first loop takes 63 cycles, and the BEQ branch adds an extra 20 cycles: | ||
+ | obviously, these are simply timed for the normal and badlines. | ||
+ | that is needed here is to add 2 cycles to each loop: I changed the | ||
+ | CPX #$00 above to INX DEX for the first loop, and changed a NOP NOP | ||
+ | to a CMP ($EA,X) in the $123F branch. | ||
+ | the next step is the initial timing at $11FE. | ||
+ | of code two bytes can be freed up, giving 2-6 cycles to fiddle with: | ||
+ | |||
+ | $11FE LDY #$0B LDX #$0B | ||
+ | LDA #$01 LDA #$01 | ||
+ | STA $D019 STA $D019 | ||
+ | LDX #$02 LDY #$02 | ||
+ | $1207 DEX DEY | ||
+ | BNE $1207 BNE $1207 | ||
+ | STY $D020 STX $D020 | ||
+ | STY $D021 STX $D021 | ||
+ | LDY #$00 NOP NOP | ||
+ | LDX #$08 LDX #$08 | ||
+ | $1214 tech-tech loop | ||
+ | |||
+ | As you can see, by using .Y in the delay loop instead of .X, the LDY #$00 | ||
+ | instruction becomes redundant. | ||
+ | Finally, there is the matter of the last raster line. When the JMP $125E | ||
+ | is taken, there is some delay before changing the border and background | ||
+ | registers, to get a nice solid line. It turns out that four extra cycles | ||
+ | are needed here, and fortunately there just happens to be several padding | ||
+ | bytes before $125E. | ||
+ | extra NOPs are easily inserted. | ||
+ | |||
+ | Falstaff rasters: | ||
+ | |||
+ | Immediately after the tech-tech are the rasters and open borders | ||
+ | behind the FALSTAFF sprite. | ||
+ | fixed a sideborder routine in the first page, and this one is similar: | ||
+ | |||
+ | $0EA0 LDX # | ||
+ | DEX | ||
+ | BNE *-3 | ||
+ | LDX #$15 | ||
+ | CMP #$EA | ||
+ | $0EA9 BIT $EA ; | ||
+ | LDA $xxxx,X | ||
+ | DEC $D016 | ||
+ | STA $D021 | ||
+ | INC $D016 | ||
+ | STA $D020 | ||
+ | LDA $xxxx,X | ||
+ | STA $D011 | ||
+ | NOP | ||
+ | NOP | ||
+ | NOP | ||
+ | DEX | ||
+ | BNE $0EA9 | ||
+ | LDX # | ||
+ | delay loop | ||
+ | |||
+ | As before, +1 cycle needs to be added to the border loop, which is | ||
+ | easy enough to do by changing the $0EA9 instruction. | ||
+ | also needs a little change, to make the last line nice and solid | ||
+ | (there are no sprites active on the last lines). | ||
+ | initial delay needs to be changed, to line up the routine correctly. | ||
+ | I changed a CMP #$EA at $1277, just before the JSR $0EA0 call, to | ||
+ | NOP NOP. As usual in this type of thing, it was easy to simply | ||
+ | experiment with the initial timing until the borders opened up. | ||
+ | |||
+ | Bottom rasters and sprite scroll: | ||
+ | |||
+ | Finally, the bottom rasters need fixing up. Once again, the | ||
+ | loop looks familiar: | ||
+ | |||
+ | $135D LDY #$FF | ||
+ | LDX #$55 | ||
+ | CMP #$EA | ||
+ | NOP | ||
+ | NOP | ||
+ | NOP | ||
+ | $1366 LDA $7F02,X | ||
+ | STA $CF18, | ||
+ | STY $D017 | ||
+ | LDA $1504,X | ||
+ | STA $D011 | ||
+ | LDA $1C32,X | ||
+ | DEC $D016 ; | ||
+ | STA $D021 | ||
+ | INC $D016 | ||
+ | DEX | ||
+ | BNE $1366 | ||
+ | |||
+ | And once again, we need to add a cycle. | ||
+ | that the STA $D011 can be changed to a $CF12, | ||
+ | to EA EA adds the two cycles needed in the init to get everything aligned, | ||
+ | and poof -- instant open borders (as long as all the sprites are being | ||
+ | displayed correctly). | ||
+ | When the scroll is activated, something else becomes apparent: | ||
+ | as the sprites scroll into the left border, they " | ||
+ | several extra pixels, i.e. the screen coordinates change by too much. | ||
+ | To fix the sprite popping, all that was needed was to add 8 to $0F00 | ||
+ | (the sprite coordinate). | ||
+ | |||
+ | Too many cycles: | ||
+ | |||
+ | Alas, the moment we've been dreading has arrived. | ||
+ | world do we find enough cycles for three subroutines? | ||
+ | to get rid of some effects -- for example, the sprite scroller has | ||
+ | many time-consuming effects that can occur; by getting rid of those, | ||
+ | the tune can be placed in the $31 raster. | ||
+ | doesn' | ||
+ | sure would be nice to get all the effects going; but where to get the | ||
+ | cycles? | ||
+ | A little thinking suggests something, though -- the PAL | ||
+ | routine can't have THAT many extra cycles, simply because it doesn' | ||
+ | get cycles until the raster routine is finished, at the bottom of | ||
+ | the screen... oh duh. A glance at the raster routine shows that | ||
+ | it covers $55 rasters. | ||
+ | which means that the rasterbars extend all the way down to line | ||
+ | $0127 = 294 or so, which is over 30 rasters past the last NTSC line. | ||
+ | So suddenly there' | ||
+ | |||
+ | (Orig PAL) (Fixes) | ||
+ | |||
+ | $135D LDY #$FF | ||
+ | LDX #$55 LDX #$35 ;Only $35 raster lines | ||
+ | CMP #$EA NOP NOP ;+2 cycles | ||
+ | NOP | ||
+ | NOP | ||
+ | NOP | ||
+ | $1366 LDA $7F02,X LDA $7F22, | ||
+ | STA $CF18,Y | ||
+ | STY $D017 | ||
+ | LDA $1504,X LDA $1524, | ||
+ | STA $D011 STA $CF12,X ;+1 cycle to open borders | ||
+ | LDA $1C32,X LDA $1C52, | ||
+ | DEC $D016 ; | ||
+ | STA $D021 | ||
+ | INC $D016 | ||
+ | DEX | ||
+ | BNE $1366 | ||
+ | |||
+ | Note that the table offsets need to be increased; without that, the sprites | ||
+ | start stretching all over the place and everything goes weird. | ||
+ | raster lines instead of $55, the rasters go all the way down to the last | ||
+ | NTSC lines, and free up just enough cycles for two of the $13B3 subroutines | ||
+ | to be re-enabled: | ||
+ | |||
+ | $13B3 JSR $211C ;Tune | ||
+ | JSR $0F80 ; | ||
+ | BIT $1300 ; | ||
+ | JSR $1D00 ; | ||
+ | |||
+ | That pesky $1300 subroutine just pushes it over the edge, though: | ||
+ | |||
+ | $1300 LDX #$2F | ||
+ | LDA #$00 | ||
+ | STA $7F01,X | ||
+ | STA $7F2D,X | ||
+ | DEX | ||
+ | BNE *-9 | ||
+ | |||
+ | It doesn' | ||
+ | than $35 rasters. | ||
+ | obscures parts of the scroll. | ||
+ | If you think about it for a moment, there are some free cycles | ||
+ | still hanging around elsewhere in the code: in the tech-tech routine! | ||
+ | There are lots of NOPs inside of the tech-tech loops; enough to piggy-back | ||
+ | the $1300 routine into it without much effort: | ||
+ | |||
+ | $11FE LDX #$0B LDX #$0B | ||
+ | LDA #$01 LSR $D019 ; | ||
+ | STA $D019 LDY #$02 | ||
+ | LDY #$02 DEY | ||
+ | $1207 DEY BNE $1205 | ||
+ | BNE $1207 STA $D020 | ||
+ | STX $D020 STX $D021 | ||
+ | STX $D021 LDX #$08 | ||
+ | $1210 NOP NOP | ||
+ | NOP NOP | ||
+ | $1212 LDX #$08 LDA $1887,Y | ||
+ | LDA $1887,Y STA $D018 | ||
+ | STA $D018 LDA $1B87,Y | ||
+ | LDA $1B87,Y STA $D016 | ||
+ | STA $D016 INY | ||
+ | INY ROL ; | ||
+ | DEX ROL | ||
+ | NOP ROL | ||
+ | NOP AND #$03 | ||
+ | NOP STA $DD00 | ||
+ | LSR LDA #$00 | ||
+ | LSR STA $7F01,Y | ||
+ | LSR STA $7F2A,Y | ||
+ | LSR DEX | ||
+ | LSR BEQ $123F ; | ||
+ | LSR | ||
+ | STA $DD00 | ||
+ | INX | ||
+ | DEX | ||
+ | BEQ $123F | ||
+ | |||
+ | Using LSR $D019 instead of LDA #$01 STA $D019 saves 2-bytes (6502 | ||
+ | instructions like LSR use a " | ||
+ | to the register while LSR is working, which is why LSR $D019 clears $D019!) | ||
+ | while still using 6 cycles. | ||
+ | ROL code saves another byte. Moving the DEX down saves two more bytes, | ||
+ | and getting rid of the NOPs saves three more, for a total of 8 bytes -- | ||
+ | just enough for the LDA STA STA which replaces the JSR $1300 code. | ||
+ | The new code uses two fewer cycles, however -- but by rearranging the | ||
+ | initialization code we can just branch to a NOP at $1211 above. | ||
+ | Every eighth raster the branch is made to $123F. | ||
+ | performs another INY, which has the effect of skipping every eighth entry | ||
+ | in the $7Fxx tables. | ||
+ | |||
+ | $123F BIT $EA BIT $EA | ||
+ | CMP ($EA,X) STA $7F02,Y | ||
+ | NOP STA $7F2B,Y | ||
+ | NOP | ||
+ | ... | ||
+ | |||
+ | The two STA xxxx,Y each take 5 cycles, so 10 cycles exactly replaces | ||
+ | the 10 cycles used by CMP ($EA,X) NOP NOP. But since the STAs take two more | ||
+ | bytes, the rest of the routine needs to be moved forwards; luckily, there | ||
+ | are empty padding bytes immediately following the routine, so it moves | ||
+ | forwards without a hitch... | ||
+ | Well, maybe with one hitch. | ||
+ | self-modifying code all over the place. | ||
+ | immediately follow the routine, and so need to be adjusted to the | ||
+ | new locations. | ||
+ | Finally, for some strange reason, the above fixes also affect | ||
+ | the FALSTAFF rasters. | ||
+ | delay fixes that up, though. | ||
+ | |||
+ | So much for extra cycles! | ||
+ | |||
+ | The Tune: | ||
+ | |||
+ | Finally, the tune really does sound better at 50Hz, so a 5/6 | ||
+ | delay needs to be added. | ||
+ | ideal: | ||
+ | |||
+ | $09E0 DEC $09FF | ||
+ | BEQ *+3 | ||
+ | JMP $211C ;Play tune | ||
+ | LDA #$05 ;Skip tune every 6th frame | ||
+ | STA $09FF | ||
+ | RTS | ||
+ | |||
+ | As the comments say, it simply skips every sixth frame of the tune, to | ||
+ | give it an effective play rate of 50Hz. It works! | ||
+ | |||
+ | Glitches: | ||
+ | |||
+ | Unfortunately, | ||
+ | bottom rasters have some flicker in the upper right (and occasionally upper- | ||
+ | left) corners. | ||
+ | rewriting the raster loop. | ||
+ | |||
+ | Much worse, though, is that the screen will glitch badly when | ||
+ | certain sprite scroll effects are activated. | ||
+ | quite a lot of rasters, since even with the tune disabled glitches are | ||
+ | still evident. | ||
+ | but that doesn' | ||
+ | that's no good. A third is to do a big code rewrite -- bleah. | ||
+ | easiest is to just call it a 95% fix and say "good enough" | ||
+ | |||
+ | And there it is -- Our First Fix. Hope u lik3 1t, d00dZ! | ||
+ | time, we'll have a look at some of the routines which weren' | ||
+ | time, such as fastload and FLI routines. | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | begin 644 slowideas-ntsc | ||
+ | M`0@+" | ||
+ | MO9I=G8CXBM# | ||
+ | M(" | ||
+ | M@)`+H@$@)@*%+2`1`J@@$0*JO3(" | ||
+ | M(!$" | ||
+ | MA0$L, | ||
+ | MZ.`(T/ | ||
+ | M8" | ||
+ | MA\`/ | ||
+ | MX)43(8# | ||
+ | MXT*" | ||
+ | MY5\)H!@*P7, | ||
+ | MX> | ||
+ | M> | ||
+ | MIS7(F\W.`9D(J_T2-/ | ||
+ | MRLWD3=5DW)N2R^$%7GYK, | ||
+ | M9# | ||
+ | M.83I2^2U, | ||
+ | MZ!Q\^> | ||
+ | MP`%> | ||
+ | MLQV9]-RU95O(<# | ||
+ | MH' | ||
+ | MIAW<'< | ||
+ | M0!^:? | ||
+ | M]> | ||
+ | MWDKO6ZCB.6V$MC=A^? | ||
+ | M[BZAI`UI=: | ||
+ | MEN' | ||
+ | M" | ||
+ | MXPP:' | ||
+ | M3.L; | ||
+ | M/ | ||
+ | M7U!M-? | ||
+ | M: | ||
+ | M" | ||
+ | M5& | ||
+ | M$_P4& | ||
+ | MPH" | ||
+ | M1# | ||
+ | MGV`QHSH+%$/ | ||
+ | M< | ||
+ | M6$P-"&< | ||
+ | M`\S): | ||
+ | M? | ||
+ | M<</< | ||
+ | M^@' | ||
+ | M<<? | ||
+ | MQC4> | ||
+ | MT[[)T_V; | ||
+ | MS]J&: | ||
+ | M`# | ||
+ | M-R<?# | ||
+ | MPY8< | ||
+ | MU]_O\`24? | ||
+ | M52" | ||
+ | M6YB.@8& | ||
+ | M3_O$: | ||
+ | MP, | ||
+ | MZBIJJNKK*VNK^WDL; | ||
+ | MJFHIZ: | ||
+ | M' | ||
+ | MS_4N? | ||
+ | M_8F)CY_LG/ | ||
+ | MCAQT4$].34Q+2DE(1T9%1$)!0# | ||
+ | M(!\>' | ||
+ | M\#> | ||
+ | MHJ, | ||
+ | MX!?; | ||
+ | M\_@L, | ||
+ | M0-S^!@[G]S]`PX# | ||
+ | M7MZZ_7(%?# | ||
+ | MWT[WXN_> | ||
+ | M>>? | ||
+ | M@0' | ||
+ | MSM$Q.4%)`5%986D!< | ||
+ | MC\@0$%" | ||
+ | M%U> | ||
+ | M]? | ||
+ | M: | ||
+ | M[!PL/ | ||
+ | M/ | ||
+ | M`/ | ||
+ | M4!0" | ||
+ | MZ==(DGO$&< | ||
+ | M9)O97@& | ||
+ | M4H@`8? | ||
+ | M`G[`D_8(GZ6\!-PGZA$_0`GZ`D_0)`Z"? | ||
+ | M\: | ||
+ | MYBJ`' | ||
+ | M$^/ | ||
+ | M_HL__P/ | ||
+ | M1< | ||
+ | M1L(_< | ||
+ | MS^# | ||
+ | MIIYK# | ||
+ | M3I5T(B``' | ||
+ | M\/ | ||
+ | MKQ0%(U4!Z%G^CU)+KRMD_/ | ||
+ | MCT7D: | ||
+ | MBT^T47O@`'' | ||
+ | MW# | ||
+ | MOG/ | ||
+ | MP# | ||
+ | M" | ||
+ | MF_+# | ||
+ | MN)`# | ||
+ | M_^,/ | ||
+ | M`' | ||
+ | M%C& | ||
+ | M!P`P`IT, | ||
+ | M!^%' | ||
+ | M)*__`^? | ||
+ | M7(A4R$)< | ||
+ | M`6A4N-FUI^WZ/ | ||
+ | M? | ||
+ | MC_& | ||
+ | M?/ | ||
+ | MK3!5RI]Q.G]XA`? | ||
+ | M_)4`/ | ||
+ | M%8GHY3Q' | ||
+ | M" | ||
+ | M_A_RM: | ||
+ | M%AK(T%64C$`U&" | ||
+ | M, | ||
+ | M? | ||
+ | MELG" | ||
+ | M# | ||
+ | M5JWP`RC9`4": | ||
+ | M': | ||
+ | MRD`: | ||
+ | M)'" | ||
+ | M719`F\' | ||
+ | M# | ||
+ | M!`# | ||
+ | MN5W/& | ||
+ | M" | ||
+ | M.Y[^[]H=WT]^3KZY_6_XV/& | ||
+ | MTN9OC([GM_[Y`> | ||
+ | M>, | ||
+ | M)# | ||
+ | M"/ | ||
+ | M/ | ||
+ | MB`!# | ||
+ | M; | ||
+ | M)BZ" | ||
+ | M' | ||
+ | M? | ||
+ | M\S>;> | ||
+ | M/ | ||
+ | M+UZ3; | ||
+ | M8.3N!78RLB`QG, | ||
+ | M" | ||
+ | M8@!2B; | ||
+ | M: | ||
+ | MF' | ||
+ | M-X: | ||
+ | M-`D"/ | ||
+ | MH%&' | ||
+ | MC(V+C$R9^G)P< | ||
+ | MS9.V8`64KH[K79#, | ||
+ | MGRC@`N$]ZS>/>" | ||
+ | M, | ||
+ | M689/ | ||
+ | MFFH# | ||
+ | MN+)[7; | ||
+ | M[M!8]& | ||
+ | M=T^B8J& | ||
+ | M-Z& | ||
+ | M7): | ||
+ | M%9`5CT, | ||
+ | M; | ||
+ | MCJS>< | ||
+ | MLEW$< | ||
+ | M; | ||
+ | MIY/ | ||
+ | MWQQ^@.`& | ||
+ | M_@> | ||
+ | M93N' | ||
+ | MJ& | ||
+ | M!8YEH49R2RYH# | ||
+ | M& | ||
+ | MX; | ||
+ | M"' | ||
+ | MV, | ||
+ | M2/ | ||
+ | M< | ||
+ | M6? | ||
+ | M^4P' | ||
+ | M7;& | ||
+ | M-H$> | ||
+ | MZJ]/ | ||
+ | M>< | ||
+ | M4: | ||
+ | M65+: | ||
+ | M/ | ||
+ | MT0)Q`C02`G_HYDF.(07' | ||
+ | MG+DB%@" | ||
+ | ML)/ | ||
+ | M*RELO!+O01R7%XRFOMJVC@]F!9%(?& | ||
+ | MPHT_9A%5K\LOXT^O\YFJ8!B< | ||
+ | M7^ZSZL`F=9][S.UQRNK9Q=' | ||
+ | MD[64SC> | ||
+ | M$]G> | ||
+ | M]<, | ||
+ | MC@VSK-]S# | ||
+ | M(> | ||
+ | M2\W2# | ||
+ | MH"# | ||
+ | MFEX< | ||
+ | MCE_: | ||
+ | MXF7V$QM`(J? | ||
+ | MZ_R? | ||
+ | M2%S; | ||
+ | M" | ||
+ | M8[Y+# | ||
+ | MYZW# | ||
+ | MU.' | ||
+ | MU(W2E=N5W*7=K< | ||
+ | MV985][TZ7O[> | ||
+ | MNWR^]@MQOM\/" | ||
+ | M? | ||
+ | MG>" | ||
+ | M0>' | ||
+ | M? | ||
+ | M0^5> | ||
+ | MU_`3/ | ||
+ | MST^6+!.AD[& | ||
+ | MBB& | ||
+ | MBN3K< | ||
+ | M4HEQ$31]J);# | ||
+ | MK($> | ||
+ | MJ\" | ||
+ | MIA99G0# | ||
+ | M" | ||
+ | M4Z-I' | ||
+ | M(ZE$<: | ||
+ | M;& | ||
+ | MHI; | ||
+ | M2/ | ||
+ | MGIN\(CT" | ||
+ | M4\1O)(2": | ||
+ | M$H_1U1^C2C]%E' | ||
+ | M>/ | ||
+ | M6' | ||
+ | M`ND" | ||
+ | M, | ||
+ | MB(HY? | ||
+ | M4=4# | ||
+ | M-9WN, | ||
+ | MK</& | ||
+ | MP< | ||
+ | M5@H=, | ||
+ | MA(>' | ||
+ | MFN+7Y7/ | ||
+ | M8M9)S5# | ||
+ | MPQ!=J_I)& | ||
+ | M_78.QZ" | ||
+ | MU$O+4N)Z&< | ||
+ | M\!RM# | ||
+ | MZT`+F^VF" | ||
+ | M)# | ||
+ | M\*QQPY/ | ||
+ | M7ZE<< | ||
+ | M/ | ||
+ | M!PAJ06-SV8" | ||
+ | M$%-WV" | ||
+ | MN<& | ||
+ | MJ\OL#" | ||
+ | M$3PNQ1!%UAD: | ||
+ | M55=96UU? | ||
+ | M,;' | ||
+ | MT? | ||
+ | MU> | ||
+ | MKN`ZNNB[> | ||
+ | M*7H_8? | ||
+ | M`UO]: | ||
+ | M+K;' | ||
+ | M_5Z: | ||
+ | MEK4> | ||
+ | M; | ||
+ | MHA2I0!D!? | ||
+ | MFC< | ||
+ | M-33[1" | ||
+ | M/ | ||
+ | M=*4KI9ND" | ||
+ | MG_T`JI@(7_J18_, | ||
+ | ML& | ||
+ | M84TG# | ||
+ | M' | ||
+ | M(QHJ!U/ | ||
+ | M%& | ||
+ | M4@, | ||
+ | MT.AMC`!6!.A&< | ||
+ | M!AEKIW" | ||
+ | M.40!%)# | ||
+ | MQAD& | ||
+ | M*0U4Q< | ||
+ | M=#; | ||
+ | MF-8\NIF9-K;: | ||
+ | M8JR9T`T& | ||
+ | MKZIP`YTG8S/ | ||
+ | M)J# | ||
+ | M/ | ||
+ | MZ3Z(J3=@\..D!:" | ||
+ | MXUN: | ||
+ | MGRJ? | ||
+ | M" | ||
+ | MM< | ||
+ | M1@' | ||
+ | MZ$_N1\[^ZG; | ||
+ | MROM[5^# | ||
+ | MYB0^0..@`\IW/"; | ||
+ | MD-" | ||
+ | M@',? | ||
+ | M28)\TUI_-? | ||
+ | MR6PP*8T0AH_86G+& | ||
+ | MQ,? | ||
+ | ME-=C^: | ||
+ | M0Z9[IH.F$X6SA6-4C4XU]L8S@J13IQ).7)' | ||
+ | M\: | ||
+ | M=RY> | ||
+ | MOWE)D`.MY-[30!'"& | ||
+ | MK!J47^< | ||
+ | M' | ||
+ | M\M0P52`& | ||
+ | M-" | ||
+ | M.3DU, | ||
+ | M> | ||
+ | M" | ||
+ | MC$M*RPN, | ||
+ | M6EE: | ||
+ | MEJ: | ||
+ | M69G)Z> | ||
+ | M5? | ||
+ | M*& | ||
+ | M'; | ||
+ | M0QIX4Y_#' | ||
+ | MD`D5, | ||
+ | M, | ||
+ | M0!I=J%EU%GQ# | ||
+ | MSV(I/ | ||
+ | M.Z5R!X.X*J+G9)9$50-< | ||
+ | MND%*G5P9.!H!3' | ||
+ | MS/ | ||
+ | M9@-, | ||
+ | MTR]N2: | ||
+ | MQ@90< | ||
+ | M6? | ||
+ | M5/; | ||
+ | M*=.$; | ||
+ | M4' | ||
+ | M^" | ||
+ | M@H`G36BB2`/ | ||
+ | M^-%NH@0UH`: | ||
+ | M2; | ||
+ | M6I\#" | ||
+ | MN)YNX/ | ||
+ | M$P< | ||
+ | MD^7S7R3J/ | ||
+ | M.S_GRS/ | ||
+ | M!V? | ||
+ | M)ALF," | ||
+ | M? | ||
+ | MF.!_`T$!M\MWS7`Q"# | ||
+ | MBGK@Q\> | ||
+ | MGF5O51R%SE\F# | ||
+ | M^08$!/ | ||
+ | M5RK%$$%QZ[J]_> | ||
+ | MHHBZ/ | ||
+ | M? | ||
+ | MAAF# | ||
+ | MX< | ||
+ | ME^J\I0T=EI(J3X(=[1Z@7B@G56@/ | ||
+ | M5I_5TX<< | ||
+ | M[3PX+3PM.C# | ||
+ | M\J, | ||
+ | M(`> | ||
+ | M\; | ||
+ | MPMX6LA> | ||
+ | M;,> | ||
+ | M.+, | ||
+ | M@< | ||
+ | M1.)DD!JI`7GY_0;< | ||
+ | MPAMRZ$R7+X9`? | ||
+ | M: | ||
+ | M!P\? | ||
+ | M0& | ||
+ | M_Y\`/ | ||
+ | MRIQ^`R49Z\!X/: | ||
+ | M6ZW# | ||
+ | M!TJ5E2=(`4\@& | ||
+ | MN`+XL/ | ||
+ | MY=# | ||
+ | M^BUAL2I+!`5+; | ||
+ | M]+T1`# | ||
+ | M4_4: | ||
+ | M3; | ||
+ | M@&' | ||
+ | M!> | ||
+ | MLU^? | ||
+ | M`4_C(`TWCRU7NHW*N(? | ||
+ | MO2@# | ||
+ | M]4NU=[R=;?: | ||
+ | ME=9H=+OU^G7A6$C6? | ||
+ | M.OV3I./ | ||
+ | MR5? | ||
+ | MYD.1)S)U)U)TAJO)4@7>< | ||
+ | MSE? | ||
+ | M: | ||
+ | M@? | ||
+ | MI8.W8" | ||
+ | MN"# | ||
+ | MUN6RQH(IUO.@`< | ||
+ | M[F< | ||
+ | MCW(/ | ||
+ | M7ZK`4LN, | ||
+ | MWY^GE# | ||
+ | MYGRR-_P/ | ||
+ | MF8!Q54I, | ||
+ | M_KR!@SO5*@(-R5CSG@^!S`_X`WZOE#? | ||
+ | M!YRP2X'#; | ||
+ | MA0%!: | ||
+ | M/ | ||
+ | M" | ||
+ | MMPX,' | ||
+ | M6K)R< | ||
+ | MJ\" | ||
+ | MG" | ||
+ | MZTI, | ||
+ | MP]C-%A@" | ||
+ | M-07%: | ||
+ | M?? | ||
+ | MP0ZF, | ||
+ | MDMHC1JDB1$B3B+AJM4, | ||
+ | M&>' | ||
+ | MG# | ||
+ | MS$"? | ||
+ | M8> | ||
+ | ML4M; | ||
+ | M& | ||
+ | M" | ||
+ | M4T# | ||
+ | M, | ||
+ | MF8QVHZ`X" | ||
+ | M`+09!_Q.C[" | ||
+ | M/ | ||
+ | M]$0'; | ||
+ | M3]^Q< | ||
+ | MP:, | ||
+ | M$/ | ||
+ | MP5(0M-`2PZ, | ||
+ | M*`DN87B? | ||
+ | MH+0=DXQ-N+UH*8GF, | ||
+ | M88OP@/ | ||
+ | M`E/ | ||
+ | M20& | ||
+ | M\29+<" | ||
+ | M\N^PU=@NX2K8F)+A; | ||
+ | M< | ||
+ | M5[%6XDPRJ`8PSLF< | ||
+ | M: | ||
+ | M%X9PX// | ||
+ | MD#" | ||
+ | M`E-[]" | ||
+ | MXBT-7CD`3$8I`(!`F4[" | ||
+ | M(`)7N16V$[< | ||
+ | M/ | ||
+ | MGZ4.>& | ||
+ | MPK`8DT\N+7H> | ||
+ | MP0LO" | ||
+ | MSF' | ||
+ | MK+7@C5L' | ||
+ | M" | ||
+ | M-*; | ||
+ | M(, | ||
+ | MOR)A@", | ||
+ | M: | ||
+ | MXY)V17`1)Q9D2< | ||
+ | MFU*A;"::: | ||
+ | M1\829`3\241299%BS9S2TU@A; | ||
+ | M/, | ||
+ | MER66\4QK;; | ||
+ | M%MK61*35; | ||
+ | M.< | ||
+ | MXM0; | ||
+ | MUZ$PR]1.)A# | ||
+ | M9I`D-9R+F; | ||
+ | MB8", | ||
+ | MKJ\K": | ||
+ | MW)[=<# | ||
+ | MIP$; | ||
+ | M/ | ||
+ | ? | ||
+ | ` | ||
+ | end | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | The Herd Mentality as gleaned from the musings of Bil Herd | ||
+ | ------------------ | ||
+ | |||
+ | This is sort of a story that was really a bunch of posts I did on compuserve | ||
+ | back in 1993. The language is coarse, the English malformed and if you' | ||
+ | a certain ex-Commodore employee you might be offended by the story. | ||
+ | that case, the story is probably about someone else and not you; my | ||
+ | fallback position would be that the whole story is a fabrication and meant | ||
+ | for entertainment, | ||
+ | stories written by burnt-out wackos is bad for your cholesterol. | ||
+ | turns out, this is about a sixth of the sillyness that went on, this is | ||
+ | more about some of the events that centered on the 8563 chip which I was | ||
+ | being asked about at the time. Oh yeah, the content is also dated with | ||
+ | a slight slant towards the melodramatic. | ||
+ | |||
+ | 12-Jan-93 | ||
+ | |||
+ | Coming soon to a terminal near you... the gruesome story of the chip that | ||
+ | almost ruined CES (and the C128 along with it) | ||
+ | |||
+ | EXPERIENCE the shame and horror of being a Chip Designer at Commodore during | ||
+ | the Witch hunts! | ||
+ | |||
+ | SEE the expressions on Managers faces when they realize that their Bonuses are | ||
+ | at stake! | ||
+ | |||
+ | HEAR the woeful lamenting of the programmers as they are beaten for no | ||
+ | apparent reason! | ||
+ | |||
+ | SHARE the experience of being a Hardware Engineer... stalking the halls in | ||
+ | search of programmers to beat (for no apparent reason). | ||
+ | |||
+ | LEARN how to say "THIS CHIP ONE SICK PUP" in Japanese. | ||
+ | |||
+ | Find out just how badly busted up the 80 column chip was and how many DIRTY | ||
+ | fixes were needed to make that all crucial show in Vegas on January 6. | ||
+ | (Christmas, what Christmas). . | ||
+ | |||
+ | <Said rather coyly in an attempt to elicit any positive responses> | ||
+ | course no one is interested..... :) Bil | ||
+ | |||
+ | |||
+ | 14-Jan-93 | ||
+ | |||
+ | This is the first of many parts as this thing went round and round during our | ||
+ | mad dash to make the CES show. I don't even remember what year it was. The | ||
+ | 8563 was a holdover from the Z8000 based C900 (the " | ||
+ | it). The people who worked on it were called the " | ||
+ | hung out was called the " | ||
+ | |||
+ | The most interesting thing that came out of that group besides a disk | ||
+ | controller that prompted you for what sector and cylinder you'd like to write | ||
+ | to on every access, was one day they stole the furniture out of the lobby and | ||
+ | made their own lounge disguising it as a VAX repair depot. We were so | ||
+ | amused by this that we stopped teasing them for a week. (But I become | ||
+ | distracted....) | ||
+ | |||
+ | Now the very very very early concept of the C128 was based on the D128, a | ||
+ | 6509 based creature (boo... hiss). | ||
+ | VIC chip onto the otherwise monochrome (6845 based) in an effort to add some | ||
+ | color to an otherwise drab machine. No one dreamed that C64 compatibility was | ||
+ | possible so no one thought along those lines. I was just coming off of | ||
+ | finishing the PLUS 4 (before they added that AWFUL built in software to it) | ||
+ | and even though I had done exactly what I was told to do I was not happy with | ||
+ | the end result and had decided to make the next machine compatible with | ||
+ | _something_ instead of yet another incompatible CBM machine. (I won't go into | ||
+ | the "yes Virginia there is Compatibility" | ||
+ | lawyers many years later still chuckling, suffice it to say I made some | ||
+ | fairly brash statements regarding my opinion of product strategy) | ||
+ | Consequently, | ||
+ | took over the C128 project. | ||
+ | |||
+ | I looked at the existing schematics once and then started with a new design | ||
+ | based on C64ness. The manager of the chip group approached me and said they | ||
+ | had a color version of the 6845 if I was interested in using it it would | ||
+ | definitely be done in time having been worked on already for a year and a | ||
+ | half...... And so the story begins..... (to be continued) | ||
+ | |||
+ | |||
+ | 16-Jan-93 | ||
+ | |||
+ | Looking back I realize that the source of a lot of the problems with the 8563 | ||
+ | is that it wasn't designed FOR the C128 and that the IC designers did not | ||
+ | take part in the application of their chip the way the other designers did. | ||
+ | The VIC and MMU designers took an active interest in how their chip was used | ||
+ | and how the system design worked in relation to their chip. I overlooked | ||
+ | ramifications of how the 8563 was spec' | ||
+ | later. For example, it was explained to me how there was this block transfer | ||
+ | feature for transferring characters for things like scrolling. | ||
+ | need that. Later it would turn out when this feature finally did work | ||
+ | correctly that it only was good for 256 characters at a time. 256 characters | ||
+ | at a time. 256 characters at a time?? I never stopped to think to ask if the | ||
+ | feature was semi-useless because it could only block move 3 and 1/3 lines at | ||
+ | a time. Did I mention the block move was only good for 256 characters? | ||
+ | a bug in this feature would almost prove a show stopper with a serious | ||
+ | problem showing up in Vegas the night of setup before the CES show. But I get | ||
+ | ahead of myself. It was also my understanding that this part had the same | ||
+ | operating parameters as the 6845, a VERY common graphics adapter. | ||
+ | scrutinizing the chip for timing differences the way I normally did any new | ||
+ | chip was another mistake I made. The major timings indicated what speed | ||
+ | class it was in and I didn't check them all. I blame myself as this really is | ||
+ | the type of mistake an amateur makes. | ||
+ | :) | ||
+ | |||
+ | |||
+ | 16-Jan-93 | ||
+ | |||
+ | It turns out that a major change had been made to the way the Read/Write line | ||
+ | was handled. When I asked about this, VERY late in the design cycle, like in | ||
+ | Production when this problem turned up, I was told " | ||
+ | designed to work in the Z8000 machine." | ||
+ | look on my face! Even though the Z8000 machine was long dead and we had been | ||
+ | TRYING for 6 months to use this damn thing in the C128 I'm being told NOW | ||
+ | that you didn't design it to work the way we've been using it for 6 months? | ||
+ | Shoulda asked.... it was my fault, shoulda asked "is this meant to work" | ||
+ | :/ | ||
+ | |||
+ | |||
+ | Don't get me wrong, the designer was VERY bright, he held patents for some of | ||
+ | the " | ||
+ | conjunction with other chips and that's where some of the problems lay. Our | ||
+ | story opens as Rev 0 of the chip.... (what' | ||
+ | Our story opens as Rev 1 of the chip makes its debut and ......(pardon me a | ||
+ | moment.....) Our story opens as Rev 2 of the chip makes it debut..... <to be | ||
+ | continued> | ||
+ | |||
+ | |||
+ | 19-Jan-93 | ||
+ | |||
+ | Forgive the sporadic nature of these additions. | ||
+ | yeah.... It was sometime in September when we got 8563 Silicon (or so memory | ||
+ | serves) good enough to stick in a system. I can't remember what all was wrong | ||
+ | with the Chip but one concern we had was it occasionally (no spell checker | ||
+ | tonight, bear with me) blew up.... big time.... turn over die and then smell | ||
+ | bad..... But then all of the C128 prototypes did that on a semi regular basis | ||
+ | as there wasn't really any custom silicon yet, just big circuit boards | ||
+ | plugged in where custom chips would later go... but you can't wait for a | ||
+ | system to be completed before starting software development. | ||
+ | any of the Animals really gave it a thought until when the next rev of the | ||
+ | chip came out and now with less other problems the blowing up ' | ||
+ | pronounced. | ||
+ | to go check on the programmer' | ||
+ | spray coming out of their office.... later it turned out they usually weren' | ||
+ | spraying the boards just using their " | ||
+ | I had to do was touch the board in a mystical way and then back out slowly | ||
+ | sometimes accompanied by ritual like chanting and humming. | ||
+ | as the " | ||
+ | it turned out I had stolen the power supply myself without telling them.... | ||
+ | If anybody else got caught " | ||
+ | locker and then the box kicked out from under them leaving them stuck until | ||
+ | they could peel themselves down, but that's another story.) | ||
+ | problem still existed on Rev 4 (I think it was) we got concerned. | ||
+ | this time that the single most scariest statement came out of the IC Design | ||
+ | section in charge of the ' | ||
+ | some chance statistically that any read or write cycle will fail due to | ||
+ | (synchronicity)" | ||
+ | |||
+ | |||
+ | 19-Jan-93 | ||
+ | |||
+ | Synchronicity problems occur when two devices run off of two separate clocks, | ||
+ | the VIC chip hence the rest of the system, runs off of a 14.318Mhz crystal | ||
+ | and the 8563 runs off of a 16Mhz Oscillator. Now picture walking towards a | ||
+ | revolving door with your arms full of packages and not looking up before | ||
+ | launching yourself into the doorway. You may get through unscathed if your | ||
+ | timing was accidentally just right, or you may fumble through losing some | ||
+ | packages (synonymous to losing Data) in the process or if things REALLY foul | ||
+ | up some of the packages may make it through and you're left stranded on the | ||
+ | other side of the door (synonymous to a completely blown write cycle). What I | ||
+ | didn't realize that he meant was that since there' | ||
+ | cycle to slip through, he didn't take even the most rudimentary protection | ||
+ | against bad synchronizing. It's MY FAULT I didn't ask, "what do you mean | ||
+ | fully by that statement" | ||
+ | protection. As it turns out the 8563 instead of failing every 3 years or so | ||
+ | (VERY livable by Commodore standards) it failed about 3 times a second. | ||
+ | other words if you tried to load the font all in one shot it would blow up | ||
+ | every time! The IC designers refused to believe this up until mid December | ||
+ | (CES in 2-3 weeks!) because "their unit in the lab didn't do it." | ||
+ | said "show us" and they led the whole rabble (pitch forks, torches, ugly | ||
+ | scene) down to the lab. It turns out they weren' | ||
+ | REV of the chip, (TWO revs old), they were testing it from Basic because it | ||
+ | "blew up" every time they ran it at system speeds (No %^$#%$# sherlock. | ||
+ | That's what we're trying to tell you) and even then it screwed up once and | ||
+ | the designer reached for the reset switch saying that something does | ||
+ | occasionally go wrong. | ||
+ | tuned by Programmer Abusing I was able to snatch his arm in mid-air before he | ||
+ | got to the reset switch, with blatant evidence there on the test screen. | ||
+ | |||
+ | |||
+ | 19-Jan-93 | ||
+ | |||
+ | One of the rabble was their boss and (I have been speaking about two | ||
+ | designers interchangeably, | ||
+ | Finally came down "FIX IT" | ||
+ | we packed for the show, and there were 4 or 5 other major problems (I'll say | ||
+ | more later) with the chip and NO time to do another pass. It was obvious that | ||
+ | if we were going to make CES something had to give. As Josey Wales said, | ||
+ | "Thats when ya gotta get Mean.... I mean downright plumb crazy Loco Mean". | ||
+ | And we knew we had to. The programmer thrashing' | ||
+ | |||
+ | |||
+ | 22-Jan-93 | ||
+ | |||
+ | Memory flash, I just remembered when we found out there was no interrupt | ||
+ | facility built in to the 8563. I remember how patient the designer was when | ||
+ | he sat me down to explain to me that you don't need an interrupt from the | ||
+ | 8563 indicating that an operation is complete because you can check the | ||
+ | status ANY TIME merely by stopping what you're doing (over and over) and | ||
+ | looking at the appropriate register, (even if this means banking in I/O) or | ||
+ | better yet sit in a loop watching the register that indicates when | ||
+ | an operation is done (what else could be going on in the system besides | ||
+ | talking to the 8563 ???) Our running gag became not needing a ringer on the | ||
+ | phone because you can pick it up ANY TIME and check to see if someone' | ||
+ | it, or better yet, sit at your desk all day picking the phone up. Even in | ||
+ | the hottest discussions someone would suddenly stop, excuse himself, and pick | ||
+ | up the nearest phone just to see if there was someone on it. This utterly | ||
+ | failed to get the point across but provided hours of amusement. | ||
+ | at the local bar wondered what fixation the guys from Commodore had with the | ||
+ | pay phone. | ||
+ | |||
+ | Any ways.... To back up to the other problems that plagued the 8563. Going | ||
+ | into December a couple of things happened. | ||
+ | support a " | ||
+ | power consumption and speed the chip up. Well, something was not quite right | ||
+ | somewhere in the design because the chip got worse. | ||
+ | happened was that both designers took vacation. | ||
+ | point of view here 8-9 years in the future, but right then we couldn' | ||
+ | understand what these people were doing working on a critical project. | ||
+ | |||
+ | |||
+ | 22-Jan-93 | ||
+ | |||
+ | Or maybe I was just getting to used to eating Thanksgiving Dinner out of | ||
+ | aluminum foil off of a Lab Bench. | ||
+ | someone' | ||
+ | from work. Anyways, the chips could no longer display a solid screen. | ||
+ | first couple of characters on each line were either missing or tearing, until | ||
+ | the thing heated up, then they were just missing. | ||
+ | that even worked this good fell to where they only got 3 or 4 working chips | ||
+ | the last run. A run is a Half-Lot at MOS and costs between $40,000 and | ||
+ | $120,000 to run. Pretty expensive couple of chips. | ||
+ | |||
+ | The other problem takes a second to explain, but first a story..... | ||
+ | when TED (the Plus four) had been mutilated decimated and defecated upon, | ||
+ | management decided to kick the body one last time. "TED shall Talk" came the | ||
+ | decree and the best minds in the industry were sought... We actually did have | ||
+ | two of the most noted consumer speech people at the time, the guys who | ||
+ | designed the "TI Speak an Spell" worked out of the Commodore Dallas office. | ||
+ | They did a custom chip to interface a speech chip set to the processor. | ||
+ | Operating open loop, in other words without feedback from any of the system | ||
+ | design people (US) they defined the command registers. | ||
+ | that you wrote to request a transfer. | ||
+ | wrote the same value a second time. We referred to this as the "do it, do it | ||
+ | now" register or the "come on pretty please" | ||
+ | #$%&@ Texans" | ||
+ | ' | ||
+ | behind. This ended up having the effect of characters scrolling upwards | ||
+ | randomly. | ||
+ | |||
+ | |||
+ | 22-Jan-93 | ||
+ | |||
+ | So to recap, going into December we had a chip with .001% yield, the left | ||
+ | columns didn't work, anytime there was one pixel by itself you couldn' | ||
+ | it, the semi useless block transfer didn't work right, the power supply had | ||
+ | to be adjusted for each chip, and it blew up before you loaded all of the | ||
+ | fonts unless you took 10 seconds to load the fonts in which case it blew up | ||
+ | only sometimes. Finger pointing was in High swing, (the systems guys should | ||
+ | have said they wanted WORKING silicon) with one department pitted against the | ||
+ | other, which was sad because the other hardworking chip designers had | ||
+ | performed small miracles in getting their stuff done on time. Managers | ||
+ | started getting that look rabbits get in the headlights of onrushing Mack | ||
+ | trucks, | ||
+ | commonly seen doing both. Our favorite behavior was where they hid in their | ||
+ | offices. | ||
+ | one of the key middle management positions. Programmer beatings had hit a new | ||
+ | high only to fall off to almost nothing overnight as even this no longer | ||
+ | quelled the growing tension. A sprinkler head busted and rained all over | ||
+ | computer equipment stored in the hallway. Engineering gathered as a whole and | ||
+ | watched on as a $100,000 worth of equipment became waterlogged, | ||
+ | expressions much like the bystanders at a grisly accident who can't tear | ||
+ | their attention away from the ensuing carnage. | ||
+ | didn't seriously occur to me that we wouldn' | ||
+ | had, I might have succumbed to the temptation to go hide in my office | ||
+ | (checking the telephone). | ||
+ | think what if. Next time (hopefully) I'll try and bring all the problems and | ||
+ | answers together and explain why I stopped to tell that rather out of place | ||
+ | TED story. | ||
+ | |||
+ | |||
+ | 30-Jan-93 | ||
+ | |||
+ | No single custom chip was working completely as we went into December with | ||
+ | the possible exception of the 8510 CPU. The MMU had a problem where data was | ||
+ | " | ||
+ | due to a mixup in the different revision of " | ||
+ | chips. This chip essentially had one of the older layers magically appear | ||
+ | bring old problems with it. Unfortunately, | ||
+ | fix newer problems so we didn't have a way to combine existing layers to fix | ||
+ | ALL problems. Dave D'Orio (start telling ya some of the names of a few of the | ||
+ | unsung types here) did a GREAT job of bringing most of the IC design efforts | ||
+ | together. | ||
+ | work, when he suddenly figured out what the problem was. He had looked at | ||
+ | the bad MMU chip under a microscope that day. Later that night, under the | ||
+ | influence of a few Michelobs, his brain " | ||
+ | taken earlier and he realized that an earlier layer had gotten into the | ||
+ | design. | ||
+ | |||
+ | |||
+ | 30-Jan-93 | ||
+ | |||
+ | This would not be the first time a problem would be addressed at this | ||
+ | particular bar. (The Courtyard.... If you ever saw the David Letterman where | ||
+ | the guy stops the fan with his tongue, he was a bartender there). The PLA had | ||
+ | a problem where my group had made a typo in specifying the hundred some terms | ||
+ | that comprised the different operating parameters. Well the designer in | ||
+ | charge of the PLA took this rev as an opportunity to sneak a change into the | ||
+ | chip without really going public with the fact he was making a change. | ||
+ | the change went through it caused one of the layers to shift towards one side | ||
+ | and effectively shorted the input pins together. Ya should' | ||
+ | where the designer' | ||
+ | screwed up because his engineer DIDN't make any changes (that would' | ||
+ | like admitting that something had been " | ||
+ | the designer' | ||
+ | telling his boss that he had made a change. Talk about giving someone enough | ||
+ | rope to hang themselves, we just kept paying it out yard by yard. | ||
+ | |||
+ | |||
+ | 30-Jan-93 | ||
+ | |||
+ | Anyways back to the 8563. The first problem was relatively easy to fix, | ||
+ | providing you didn't give a hang about your own self respect. | ||
+ | designer mentioned that the block copy seemed to work better when you wrote | ||
+ | the same command twice in a row. I made him explain this to me in public, | ||
+ | mostly due to the mean streak I was starting to develop when it came to this | ||
+ | particular subject. | ||
+ | register and then wrote to it again. | ||
+ | " | ||
+ | understanding (Intel, at last his eyes unfurled), "kinda like a 'come on | ||
+ | pretty please register' | ||
+ | of" he replied doubt creeping in to his voice, "you wouldn' | ||
+ | would you", I asked my face the definition of sincerity, (said in the voice | ||
+ | of the wanna-be HBO director on the HBO made for TV commercial) "why yes.... | ||
+ | yes I am" he replied. Mind you a crowd had formed by this time, that poor guy | ||
+ | never understood what was so funny about being from Texas or what a 'Damm | ||
+ | Texan' register was. | ||
+ | |||
+ | |||
+ | 30-Jan-93 | ||
+ | |||
+ | This ' | ||
+ | the guy (Von Ertwine) who was developing CP/M at home (consultant). Von had | ||
+ | wisely chosen not to try to follow all of the current Revs of the 8563, | ||
+ | instead he latched onto a somewhat working Rev4 and kept if for software | ||
+ | development. | ||
+ | properly, was taking the little metal cup that came with his hot air popcorn | ||
+ | popper (it was a buttercup to be exact) and would put an Ice cube in it and | ||
+ | set it on the 8563. He got about 1/2 hour of operation per cube. On our side | ||
+ | there was talk of rigging cans of cold spray with foot switches for the CES | ||
+ | show, " | ||
+ | no-one told Von.... but don't worry, he would find out the day before CES | ||
+ | during setup in ' | ||
+ | |||
+ | |||
+ | 23-Oct-93 | ||
+ | Sb: C128, The Final Chapter | ||
+ | |||
+ | Thought I'd finish what I'd started back in January of this year. I had been | ||
+ | talkin 'bout how busted up the 8563, now we get to the part about how it got | ||
+ | fixed... well fixed good enough... well patched good enough to give every | ||
+ | possible attempt at the appearance of maybe passably working... | ||
+ | |||
+ | One of the things that got worse instead of better was something called the | ||
+ | back bias generator. | ||
+ | to unmitigated gall... no one ever said it was unmitigated gall and I am not | ||
+ | saying that here and now) of slipping in a major change like that right | ||
+ | before a CES show, it became obvious that it needed fixed. | ||
+ | generator connects to the substrate of the chip and if you've ever seen the | ||
+ | ceramic versions of the 40 and 48 pin chips you would notice that the pin 1 | ||
+ | indicator notch is gold colored. | ||
+ | substrate. | ||
+ | indicator notch but I had little to lose. At this point all I did have to | ||
+ | lose was a HUGE jar of bad 8563' | ||
+ | " | ||
+ | jar and win a prize." | ||
+ | real number you were wrong.) I soldered a wire between this tab and the | ||
+ | closet ground pin. The left column reappeared though still a little broken | ||
+ | up! The " | ||
+ | really proclaiming it's desire to be known as the shortened version of | ||
+ | Edward. | ||
+ | and adjusted the power supply to 5.3 volts. | ||
+ | letting Tim-the-Tool-Man-Taylor soup up your blender with a chainsaw motor | ||
+ | but it worked. | ||
+ | the part to days instead of weeks as was the normal Commodore Quality | ||
+ | Standard. | ||
+ | (said with the kind of sardonic cynical smile that makes parole officers | ||
+ | really hate their jobs) | ||
+ | |||
+ | Remember the synchronicity problem? | ||
+ | built a tower for the VIC chip that had something called a Phase Lock Loop on | ||
+ | it which basically acted as a frequency doubler. | ||
+ | Clock (I think it was 8.18 Mhz.... been too long and too many other dot clock | ||
+ | frequencies since then) and doubled it. We then ran a wire over to the 8563 | ||
+ | and used this new frequency in place of its own 16 Mhz clock. | ||
+ | equivalent to putting a revolving door at the other end of the room from the | ||
+ | first door and synchronizing them so that they turn at the same rate. Now if | ||
+ | you get through the first door and walk at the right speed every time towards | ||
+ | the second door you will probably get through. | ||
+ | to a True Miracle and was accompanied by the sound of Hell Freezing over, the | ||
+ | Rabbit getting the Trix, and several instances of Cats and Dogs sleeping | ||
+ | together. This was the first time that making CES became a near possibility. | ||
+ | We laughed, we cried, we got drunk. | ||
+ | 3" X 3" PCB was produced in 12 hours (a new record) and cost us about $1000 | ||
+ | each. | ||
+ | |||
+ | A new problem cropped up with sparkle in multi-colored character mode when | ||
+ | used for one of the C64 game modes. | ||
+ | crises , I try a few things including adjusting the power supply to 4.75 | ||
+ | volts. | ||
+ | display was tearing again. Machines are marked as to whether they can do 40 | ||
+ | column mode, 80 column mode or both. We averaged 1-3 of these crises a day | ||
+ | the last two weeks before CES. Several of us suffered withdrawal symptoms if | ||
+ | the pressure laxed for even a few minutes. | ||
+ | accidentally started locking the door to one of the development labs during | ||
+ | this time. A hole accidentally appeared in the wall allowing you to reach | ||
+ | through and unlock it. They continued to lock it anyways even though the | ||
+ | gaping hole stood silent witness to the ineffectiveness of trying to lock us | ||
+ | out of our own lab during a critical design phase. | ||
+ | singleness of purpose and considered changing professions. | ||
+ | |||
+ | We finished getting ready for CES about 2:00 in the morning of the day we | ||
+ | were to leave at 6:00. On the way to catch the couple of hours sleep I hear | ||
+ | the Live version of Solsbury Hill by Peter Gabriel, the theme song of the | ||
+ | C128 Animals and take this as a good omen. Several hapless Programmers are | ||
+ | spared the ritual sacrifice this night... little do they know they owe their | ||
+ | lives to some unknown disc jockey. | ||
+ | |||
+ | Advertisements in the Las Vegas airport and again on a billboard enroute from | ||
+ | the airport inform us that the C128 has craftily been designed to be | ||
+ | expandable to 512K. Now it had been designed to be expandable originally and | ||
+ | had been respecified by management so as to not be expandable in case next | ||
+ | year's computer needed the expendability as the " | ||
+ | Commodore computer. That's like not putting brakes on this years model of car | ||
+ | so that next year you can tote the New model as reducing those annoying | ||
+ | head-on crashes. | ||
+ | |||
+ | Upon arriving at the hotel we find that out hotel reservations have been | ||
+ | canceled by someone who fits the description of an Atari employee. | ||
+ | things occur in rapid succession. | ||
+ | credit card and briskly escort her to the desk were I rented a room for all | ||
+ | available days, second, a phone call is placed to another nearby hotel | ||
+ | canceling the room reservations for Jack Trameil and company, third, several | ||
+ | of those C64's with built in monitors (C64DX' | ||
+ | brought out and left laying around the hotel shift supervisors path | ||
+ | accompanied by statements such as "My my, who left this nifty computer laying | ||
+ | here... I'd bet they wouldn' | ||
+ | |||
+ | The next day we meet up with the guy who developed CPM (Von) for the C128. | ||
+ | As I mentioned earlier, someone forgot to tell him about the silly little | ||
+ | ramifications of an 8563 bug. His 'puter didn't do it as he had stopped | ||
+ | upgrading 8563s on his development machine somewhere around Rev 4 and the | ||
+ | problem appeared somewhere around Rev 6. As Von didn't carry all the | ||
+ | machinery to do a CPM rebuild to fix the bug in software, it looked like CPM | ||
+ | might not be showable. | ||
+ | based on showing CPM. In TRUE Animal fashion Von sat down with a disk editor | ||
+ | and found every occurrence of bad writes to the 8563 and hand patched them. | ||
+ | Bear in mind that CPM is stored with the bytes backwards in sectors that are | ||
+ | stored themselves in reverse order. Also bear in mind that he could neither | ||
+ | increase or decrease the number of instructions, | ||
+ | for different ones. Did I mention hand calculating the new checksums for the | ||
+ | sectors? | ||
+ | |||
+ | Everything else went pretty smooth, every supply was adjusted at the last | ||
+ | moment for best performance for that particular demo. One application has | ||
+ | reverse green (black on green) and the 330 ohm pullups won't allow the | ||
+ | monitor to turn off fast enough for the black characters. | ||
+ | alternate pullup packs made up back in West Chester and put them in to | ||
+ | service. | ||
+ | carried by people coming to Vegas. Another crisis, no problem, this was | ||
+ | getting too easy. If a machine started to sparkle during the demo, I would | ||
+ | pull out my ever present tweak tool and give a little demonstration as to the | ||
+ | adjustability of the New Commodore power supplies. | ||
+ | Commodore supplies that worked, much less had a voltage adjustment and an | ||
+ | externally accessible fuse. I explained (and meant it) that real bad power | ||
+ | supplies with inaccessible fuses were a thing of Commodore' | ||
+ | the New design philosophy meant increased quality and common sense. | ||
+ | |||
+ | I'm told they removed the fuse access from production units the month after I | ||
+ | left Commodore. | ||
+ | |||
+ | The names of the people who worked on the PCB layout can be found on | ||
+ | the bottom of the PCB. | ||
+ | |||
+ | "RIP: HERD, FISH, RUBINO" | ||
+ | |||
+ | The syntax refers to an inside joke where we supposedly gave our lives in an | ||
+ | effort to get the FCC production board done in time, after being informed | ||
+ | just the week before by a middle manager that all the work on the C128 must | ||
+ | stop as this project has gone on far too long. After the head of Engineering | ||
+ | got back from his business trip and inquired as to why the C128 had been put | ||
+ | on hold, the middle manger nimbly spoke expounding the virtues of getting | ||
+ | right on the job immediately and someone else, _his_ boss perhaps, had made | ||
+ | such an ill suited decision. | ||
+ | area for the next several days. I slept there on an airmatress or was | ||
+ | otherwise available 24 hours a day to answer any layout questions. | ||
+ | computer room was so cold that the Egg Mcmuffins we bought the first day were | ||
+ | still good 3 days later. | ||
+ | |||
+ | About the Z80: | ||
+ | |||
+ | What court ordered Commodore to install the Z80? | ||
+ | |||
+ | It wasn't mandated by court order, it was mandated by a 23 year old | ||
+ | engineer that realized that marketing had gone and said that we were 100% | ||
+ | compatible. | ||
+ | what C64 compatibility meant. | ||
+ | C64 used glitches to clock their circuitry not realizing that the glitches | ||
+ | were not to be depended on, etc. | ||
+ | |||
+ | The Z80/CPM cartridge didn't work on all C64's, and no-one had really taken | ||
+ | the time to figure out why. Someone noticed that a certain brand of the | ||
+ | address buffer used in the CPM cart worked better than others so someone | ||
+ | concluded that it must be the timing parameters that made a difference. | ||
+ | This wasn't true, it was a very subtle problem that dealt with the way the | ||
+ | 6502, the Z80 and the DRAM had been interlaced together. | ||
+ | a CPM cart that didn't work with all C64's and it worked even | ||
+ | less reliably with the C128 even though the timing parameters in the C128 | ||
+ | were far better. | ||
+ | with the CPM cart as it only ran 20% of the time when tested overnight. | ||
+ | |||
+ | ALSO, I worked hard to make sure the C128 had a reliable power supply. | ||
+ | was told "no fuse' | ||
+ | was easily accessible... darn it anyway. | ||
+ | variations in minimum and maximum power supply requirements we couldn' | ||
+ | handle the CPM cart, it needed an additional .5 amp because of some | ||
+ | wasteful power techniques that were used in it. I couldn' | ||
+ | bill for an additional .5 amp that might only occasionally be used. | ||
+ | |||
+ | SO, with that said, I accidentally designed the Z80 into the next rev of | ||
+ | the board. | ||
+ | INCLUDING custom silicon, these were records back then, the Z80 was added | ||
+ | around the second month. | ||
+ | |||
+ | The Z80 normally calls for a DRAM cycle whenever it needs one... it might | ||
+ | go 3 clocks and then 4 and then 6 and then 5 and then 7 between dram | ||
+ | accesses. Since the processor shares the bus with the VIC chip there are | ||
+ | only certain time when the bus is available for a DRAM cycles. | ||
+ | shortest cycle for a DRAM access for the Z80 is 3 clock cycles, you are | ||
+ | sure to catch a DRAM access if you do 2 cycles (wait for vic) then 2 | ||
+ | cycles (wait for vic). Whether you catch the Z80 between clock 1 and 2 or | ||
+ | between 2 and 3 doesn' | ||
+ | Otherwise if you just let the Z80 rip it crashes when it tries to grab | ||
+ | DRAM while there is a video(vic) cycle going on. And that's why it runs | ||
+ | at a clock-stretched 2MHz. The REAL bitch was the Ready circuitry when | ||
+ | flipping between DMA/ | ||
+ | |||
+ | The C128 design team: SYS32800, | ||
+ | |||
+ | Bil Herd | ||
+ | Dave Haynie | ||
+ | jobs involving computer analysis which was something | ||
+ | totally new for CBM. | ||
+ | Frank Palaia | ||
+ | how to make a Z80 and a 6502 live peacefully with | ||
+ | each other in a synchronous, | ||
+ | time sliced, DRAM based system. | ||
+ | Fred Bowen | ||
+ | cornered. | ||
+ | when trapped. | ||
+ | Terry Ryan | ||
+ | it. Threatened with the loss of his job if he ever | ||
+ | did anything that made as much sense again. | ||
+ | been know to use cynicism in ways that violate most | ||
+ | Nuclear Ban Treaties. | ||
+ | Von Ertwine | ||
+ | search of a better machine. | ||
+ | Dave DiOrio | ||
+ | that most chip designers were from Pluto. | ||
+ | Victor | ||
+ | of the nicest guys you'd ever meet. | ||
+ | Greg Berlin | ||
+ | I think of Greg every night. | ||
+ | shoulder in a friendly brawl in a bar parking lot | ||
+ | and I still cant sleep on that side. | ||
+ | Dave Siracusa | ||
+ | |||
+ | Not to mention the 8563 designers who made this story possible. | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H #17 | ||
+ | |||
+ | -fin- | ||
+ | </ |
magazines/chacking17.txt · Last modified: 2015-04-17 04:34 by 127.0.0.1