no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+<code>
+                   ########
+             ##################
+         ######            ######
+      #####
+    #####  ####  ####      ##      #####   ####  ####  ####  ####  ####   #####
+  #####    ##    ##      ####    ##   ##   ##  ###     ##    ####  ##   ##   ##
+ #####    ########     ##  ##   ##        #####       ##    ## ## ##   ##
+#####    ##    ##    ########  ##   ##   ##  ###     ##    ##  ####   ##   ##
+#####  ####  ####  ####  ####  #####   ####  ####  ####  ####  ####   ######
+#####                                                                    ##
+ ######            ######           Issue #17
+   ##################            November 15, 1998
+       ########
+...............................................................................
+	"The words of the wise are as goads, and as nails fastened by
+	 the masters of assemblies."  Ecclesiastes 12
+	"Before criticizing a man I try to walk a mile in his shoes.  That
+	 way, if he gets mad he's a mile away and barefoot."  John Ianetta
+...............................................................................
+BSOUT
+	For me, fall is a time for reflection.  The trees descend into their
+golden time and then seem to die.  And yet, under the surface they are quite
+alive, and teeming with activity at a smaller, less-visible scale, waiting to
+burst forwards again in full bloom.  I think there's a great metaphor for C=
+in this.  But I have no idea what it is.
+	In fact things are totally hectic around here, and I haven't given
+more than a few moments thought towards the 64, so this will be a mighty
+short editorial.  Between a PhD thesis and begging for jobs there hasn't been
+much 64-time, but with a little here and a little there this issue is finally
+squeaking out.  Everybody worked hard over the summer, and my goal was
+to get it out in September.  Well, you know, these days, if something in
+the 64 world is only two months late it's doing pretty good, so no big whoop.
+C=Hacking ought to appear reguarly after December, though.
+	On the down side, some things, such as the challenge problem from
+last time, will have to wait until the next issue.  I also stayed pretty
+low-key while putting this issue together, but future issues will be more
+public in soliciting articles (e.g. on comp.sys.cbm).
+	In other news, a C64 C-compiler has finally appeared!  This outstanding
+effort is the work of Ullrich von Bassewitz, uz@musoftware.de, the force
+behind Elite128 among other things.  The cc65 webpage is at
+        http://www.von-bassewitz.de/uz/cc65/
+so have a look, and tell Ullrich what a stellar guy he is :).  Also, as
+most of you know, the Chicago Expo took place on October 24, and was a real
+hoot!  Check out
+	http://driven.c64.org/
+for some nice pictures from the Expo, taken by Mark Seelye.
+	Meanwhile, sit back, relax, and enjoy these latest musings from
+these, our masters of assembly.
+.......
+....
+..
+.                                    C=H #17
+::::::::::::::::::::::::::::::::::: Contents ::::::::::::::::::::::::::::::::::
+BSOUT
+	o Voluminous ruminations from your unfettered editor.
+Jiffies
+	o Is it a bug or a feature?
+The C=Hallenge
+	o To be continued...
+Side Hacking
+	o "SuperCPU Software Repair", by S. Judd <sjudd@nwu.edu>.
+	   An amateur's excursion into correcting errant wares.
+Main Articles
+	o "An Optimizing Hybrid LZ77 RLE Data Compression Program, aka
+	  Improving Compression Ratio for Low-Resource Decompression",
+	  by Pasi 'Albert' Ojala <albert@cs.tut.fi>
+	  Part two of a two-part article on data compression, giving a
+	  detailed description of the compression algorithms used in
+	  pucrunch, not to mention the decompression code.
+	o "VIC-20 Kernel ROM Disassembly Project", by Richard Cini
+	  <rcini@email.msn.com>
+	  This is the first in a series of articles which aims to
+	  present a complete, commented disassembly of the VIC-20
+	  ROMs.
+	o Masters Class: "NTSC/PAL fixing, part I", by Russell Reed
+	  <rreed@egypt.org>, Robin Harbron <macbeth@tbaytel.net>, and S. Judd.
+	  Sit up straight and pay attention.  In the Masters Class, a
+	  Commodore luminary attempts to instruct a couple of ignorant
+	  plebians in his art.  In this case, Robin and I set out to
+	  learn NTSC/PAL fixing from one of the greats, Decomp/Style.
+	  Our first fix, a demo from the obscure Finnish group Pu-239,
+	  is included, along with detailed descriptions of our experiences.
+	o "The Herd Mentality", by Bil Herd <bherd@zeus.jersey.net>
+	  This is a collection of entertaining musings on Commodore and the
+	  development of the C128, as provided by Bil Herd (and that's no
+	  bull).  If you don't know who Bil Herd _is_, why not type
+	  SYS 32800,123,45,6 on a 128 sometime...
+.................................. Credits ...................................
+Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd
+C=Hacking logo by.......................... Mark Lawrence
+Special thanks to Marko Makela, Olaf Seibert, and the rest of the cbm-hackers
+for their many otherwise unacknowledged contributions.
+Legal disclaimer:
+) If you screw it up it's your own damn fault!
+) If you use someone's stuff without permission you're a dork!
+For information on the mailing list, ftp and web sites, send some email
+to chacking-info@jbrain.com.
+About the authors:
+Pasi 'Albert' Ojala is a 29 year old software engineer, currently
+working at a VLSI design company on a RISC DSP core C compiler.
+Around 1984 a friend introduced him to the VIC-20, and a couple
+of years later he bought a 64+1541 to replace a broken Spectrum48K.
+He began writing his own BBS, using ML routines for speed, and
+later wrote a series of demos under the Pu-239 label.  In addition
+to pucrunch and his many C=Hacking articles, Pasi was written an Amiga
+filesystem, a graphics conversion package, a C64 burstloader, and a
+number of demos.  He is currently uses his 64 for hobbyist pursuits, and
+is contemplating multipart demos for the 64 and the VIC-20, in addition
+to future C=Hacking articles.  Pasi is also a huge Babylon-5 fan, and
+has a B5 quote page at http://www.cs.tut.fi/~albert/Quotes/B5-quotes.html
+Richard Cini is a 31 year old senior loan underwriter who first became
+involved with Commodore 8-bits in 1981, when his parents bought him a
+VIC-20 as a birthday present.  Mostly he used it for general BASIC
+programming, with some ML later on, for projects such as controlling
+the lawn sprinkler system, and for a text-to-speech synthesyzer.  All
+his CBM stuff is packed up right now, along with his other "classic"
+computers, including a PDP11/34 and a KIM-1.  In addition to collecting
+old computers Richard enjoys gardening, golf, and recently has gotten
+interested in robotics.  As to the C= community, he feels that it
+is unique in being fiercely loyal without being evangelical, unlike
+some other communities, while being extremely creative in making the
+best use out of the 64.
+Robin Harbron is a 26 year old internet tech support at a local
+independent phone company.  He first got involved with C= 8-bits
+in 1980, playing with school PETs, and in 1983 his parents convinced
+him to spend the extra money on a C64 instead of getting a VIC-20.
+Like most of us he played a lot of games, typed in games out of
+magazines, and tried to write his own games.  Now he writes demos,
+dabbles with Internet stuff, writes C= magazine articles, and, yes,
+plays games.  He is currently working on a few demos and a few games,
+as well as the "in-progress-but-sometimes-stalled-for-a-real-long-time-
+until-inspiration-hits-again Internet stuff".  He is also working on
+raising a family, and enjoys music (particularly playing bass and guitar),
+church, ice hockey and cricket, and classic video games.
+................................... Jiffies ..................................
+REM SHIFT-L
+by the cbm.hackers
+	Everybody knows the old REM shift-L trick in BASIC 2.0, which
+generates a syntax error upon listing.  But why does it work?  The
+answer turns out to be quite interesting.
+	Normally, when the BASIC interpreter tokenizes a line it strips
+out characters with the high bit set.  One exception is characters within
+quotes; the other is characters after REM.  In those cases, characters
+are embedded literally into the program line.
+	Now, BASIC tokens all have the high bit set.  When LIST
+encounters a character with the high bit set, it checks whether
+it is in quote mode.  If it is, the character is outputted as normal.
+If not, the character is treated as a token, which is expanded by
+QPLOP (located at $A717).  The part of QPLOP which prints keywords looks
+like this:
+:LOOP1   DEX              ;Traverse the keyword table
+         BEQ :PLOOP
+:LOOP2   INY              ;read a keyword
+         LDA RESLST,Y
+         BPL :LOOP2
+         BMI :LOOP1
+:PLOOP   INY              ;Print out the keyword
+         LDA RESLST,Y
+         BMI LISTENT1     ;Exit if on last char
+         JSR $AB47        ;Print char, AND #$FF
+         BNE :PLOOP
+The keyword strings in RESLST all stored dextral character inverted
+(the last char has the high bit set), and the above code just moves
+forward through the table until it has counted out .X keywords.  At
+that point, :PLOOP prints out the next keyword to the screen, and
+hops back into LIST.
+	Shift-L is character code 204, or $CC.  When LIST sees this
+inside of a REM, it sees the high bit set and de-tokenizes it.
+As it turns out, though, the last token is $CB, which is keyword GO
+(so that GO TO works).  It also turns out that RESLST, the list of
+BASIC keywords, uses 255 characters.  The 256th character is zero
+(value zero, not character zero).
+	So, the above code goes through the list, and then prints
+out token $CC, the first character of which is a null.  This zero
+is sent to $AB47.  $AB47 sends it to JSR $FFD2 (which does nothing
+with the character), performs an AND #$FF, and exits.  But that makes
+the BNE :PLOOP branch _not_ get taken, and the code erroneously moves
+forwards into...  the code which executes the FOR statement!
+	And the first thing FOR does is look for a valid variable.
+When you LIST a program, the character immediately following the
+LIST is a statement terminator (a colon : or else an end of line null).
+LET (which is called by FOR) reads this character, decides it's an
+invalid variable name, and generates a ?SYNTAX  ERROR.
+	Because QPLOP uses absolute addressing (LDA RESLST,Y), .Y
+can wrap around through 255 to traverse the list again.  This is
+why shift-M shift-N etc. all generate valid keywords.  Only shift-L
+will force an error, and it is all due to the zero in the keyword
+table.
+	Similar things can happen in other BASICs.  In BASIC 4.0,
+token $DB does the trick.  In BASIC 1.0, $CB ought to do it.
+The problem seems to have been fixed in BASIC 7.0; at least the
+trick doesn't seem to work on a 128.
+	Finally, like most things on the 64, embedding tokens into
+REM statements can naturally be put to some creative use.  For example,
+RUN once ran a contest where readers submitted stories and riddles
+and such, which were read by LISTing the program.  They were very clever
+and entertaining, and I close this summary with the one I've remembered
+since high school:
+REM WHAT'S AN APPLECOSTA?
+REM {C=-V}T A {INST CTRL-0}E
+............................... The C=Hallenge ...............................
+Wait until next time!
+................................ Side Hacking ................................
+		SuperCPU software repair
+		---------------------------> by S. L. Judd
+	One of the feature articles in this issue deals with NTSC/PAL
+fixing.  But have you ever thought about SCPU fixing?  You know how
+it goes: you have that program that could really benefit from
+the speed boost, but doesn't work, and usually because of some silly
+little thing.
+	Well, it really bugs me to have programs not like my nice
+hardware for dumb reasons, so I decided I would try my hand at fixing
+up some programs.  The one that really did it for me was the game
+"Stunt Car Racer" -- I had never played it before, but after getting
+ahold of it it was clear that here was a game that would be just great
+with a SuperCPU.  I had never done something like this before, but it
+seemed a doable problem and so I jumped in head first, and this article
+sums up my inexpert experience to date.
+	By the way, stuntcar-scpu is totally cool :).
+	To date I have fixed up just three games: Stunt Car Racer, Rescue
+on Fractalus, and Stellar 7.  My goal was really to "CMD-fix" these programs,
+to make them run off of my FD-2000 as well as my SCPU.  Although these are
+all games, the techniques should apply equally well to application programs
+with a bad attitude.  Before discussing the fixes, it is probably worthwhile
+to discuss a few generalities.
+	I also note that programmers who don't have a SuperCPU might find
+some of this information helpful in designing their programs to work with
+SCPUs.
+	Finally, my fixes are available in the Fridge.
+Tools and Process
+-----------------
+The tools I used were:
+  o Action Replay
+  o Wits
+  o Paper for taking notes (backs of receipts/envelopes work)
+I think this is all that is necessary, although a good sector editor
+can come in handy for certain things.
+After trying a number of different approaches to the problem, the process
+I've settled on goes roughly like the following:
+	- Have an idea of what will need fixing
+	- Familiarize yourself with the program
+	- Track down the things that need fixing
+	- Figure out free areas of memory
+	- Apply patches, and test
+	Most programs work in more or less the same way: there are
+some initialization routines, there's a main loop, and there's an
+interrupt routine or series of routines.  The interrupts are easy to
+find, via the vectors at either $FFFA or at $0314 and friends.  The
+initialization routine can be tougher, but can be deduced from
+the loader or decompressor; also, some programs point the NMI vector to
+the reset code, so that RESTORE restarts the program.  Finding the
+things that need fixing usually involves freezing the program at the
+appropriate time, and doing a little disassembly.  Sometimes a hunt for
+things like LDA $DC01 is helpful, too.  Figuring out free areas of
+memory is easy, by either getting a good feel for the program, or
+filling some target memory with a fill byte and then checking it
+later, to see if it was overwritten.  Once the patch works on the 64,
+all that remains is to test it on the SCPU, and it's all done!
+Diagnosis
+---------
+	It seems to me that, at the fundamental level, the SCPU is different
+from a stock machine in three basic ways: it is a 65816, it runs at 20MHz,
+and it has hardware registers/different configurations.  There are also
+some strange and mysterious problems that can arise.
+	All possible opcodes are defined on the 65816, which means that
+"illegal" or quasi-opcodes will not work correctly.  On the 65xx chips, the
+quasi-opcodes aren't real opcodes -- they are like little holes in the cpu,
+and things going through those holes fall through different parts of the
+normal opcode circuitry.  Although used by very few programs, a number of
+copy protection schemes make use of them, so sometimes the program works fine
+with a SCPU but the copy protection makes it choke -- how very annoying
+(example: EA's Lords of Conquest).  Naturally, disk-based protection methods
+mean it won't work on an FD-2000, either.
+	Running at 20Mhz makes all sorts of problems.  Any kind of software
+loop will run too fast -- delay loops, countdown loops, input busy-loops,
+etc.  Also main program loops, so that the game runs unplayably fast
+(most 3D games never had to worry about being too fast).  It can also
+lead to flickering screens, as we shall see later, and the "play" of some
+games is designed with 1Mhz in mind -- velocities, accelerations, etc.
+What looks smooth at the low frame rate might look poor at the high, as we
+shall also see later.  Finally, fastloaders invariably fail at 20Mhz,
+like any other code using software-based timing.
+	The SuperCPU also has a series of configuration registers located
+at $D07x and $D0Bx, which determine things like software speed and VIC
+optimization modes (which areas of memory are mirrored/copied to the C64's
+RAM).  Note also that enabling hardware registers rearranges $E000 ROM
+routines.  Although it is possible for programs to accidentally reconfigure
+the SCPU, it is awfully unlikely, since the enable register, which switches
+the hardware registers in, is sandwiched between disable registers:
+	$D07D	Hardware register disable
+	$D07E	Enable hardware registers
+	$D07F	Hardware register disable
+Strangely enough, though, different hardware configurations can sometimes
+cause problems.  For example, newer (v2) SCPUs allow selective mirroring of
+the stack and zero page, and by default have that mirroring turned OFF.
+For some totally unknown reason, this caused major problems with an early
+attempt of mine to fix Stunt Car Racer -- I am told that the old version
+would slow down to just double-speed, flicker terribly, and more.  Turning
+mirroring back on apparently fixes up the problem.  (I have an older SCPU,
+and hence did not have this problem).  So before going after a big fix, it
+is worthwhile to invest a few minutes in trying different configurations.
+	Finally, there are other strange problems that can arise.  For
+example, I have two 128s: one is a flat 128, one a 128D.  With my 128D,
+if $D030 is set then the SCPU sometimes -- but not always -- freaks out
+and locks up.  The flat 128 does not have this problem.  One reason this
+is important is that many decompressors INC $D030 to enable 2MHz mode.
+A simple BIT ($2C) fixes this problem up, but the point is that the SCPU has
+to interact with the computer, so perhaps that interaction can lead to
+problems in obscure cases.
+	Now, if the goal is to CMD-fix the program, there may be a few
+disk-related things that may need fixing.  In addition to stripping out
+(or possibly fixing up) any fastloaders, most programs annoyingly assume
+drive #8 is the only drive in town.  Also, if the program uses a track-based
+loader (instead of a file-based loader), then that will need to fixed up
+as well, and any disk-based copy protection will have to be removed.
+	There's one other thing to consider, before you fix: is the
+program really busted?  For example, if you've tried a chess program
+with the SCPU, chances are that you saw no speed improvement.  Why
+not?  It turns out that most chess programs use a timer-based search
+algorithm -- changing the playing strength changes the amount of
+time the program spends searching, and not the depth of the search.
+(The reason is to make the gameplay flow a little better -- otherwise
+you have very slow play at the beginning, when there are many more
+moves to consider).  So although it might look like it isn't working
+right with the SCPU, it is actually working quite well.
+	And that pretty much covers the basic ideas.  The first program
+I fixed up was Stunt Car Racer.
+Stunt Car Racer
+---------------
+	Stunt Car Racer, in case you don't know, is a 3D driving game,
+and quite fun.  It is also too fast, unplayably fast, at 20MHz.  Therefore,
+it needs to be slowed down!
+	My first plan... well, suffice to say that most of my original
+plans were doomed to failure, either from being a bad idea, or from
+poor implementation.  It is clear enough that some sort of delay is
+needed, though, in the main loop, or perhaps by intercepting the joystick
+reading routine.
+	The program has a main loop and an interrupt loop as well.
+The interrupt handles the display and other things, and all of the
+game calculations are done in the main loop, which flows like
+	Do some calculations
+	Draw buffer 1
+	Swap buffers
+	Do some calculations
+	Draw buffer 2
+	Swap buffers
+	JMP loop
+One of my first thoughts was to intercept the joystick I/O, which is
+easy to find by hunting for LDA $DC01 (or DC00, whichever joystick
+is used).  The patch failed, and possibly because I didn't check that
+the memory was safe, and possibly because it was in the interrupt routine
+(I simply don't remember).
+	Before patching, it is very important to make sure that the
+patch will survive, and not interfere with the program, so it is
+very important to find an area of memory that is not used by the
+program.  It took me a little while to figure this out!  Finding
+unused memory was pretty easy -- I just filled the suspect areas with
+a fill byte, ran the program, and checked that memory.  Mapping out the
+memory areas also aids in saving the file, as un-needed areas don't
+need to be saved, or can be cleared out to aid compression.
+	The first free area of memory I found was at $C000.  It turns
+out that this is a sprite, though, and so put some garbage on the
+screen.  The second I tried was $8000, which worked great in practice
+mode but got overwritten in competition mode -- always test your
+patches thoroughly!  (I had only tested in practice mode).  Finally,
+I found a few little spots in low memory that survived, and placed the
+patch there.  The program does a whole lot of memory moving, and uses
+nearly all memory.  I also left some initialization code at $8000, since
+it only needed to be run once, at the beginning (to turn on mirroring
+in v2 SCPUs).
+	Recall that the main loop has two parts -- one for buffer 1, and
+one for buffer 2.  The trick is to find some code that is common to both
+sections, like a subroutine call:
+	JSR SomeRoutine
+	Draw buffer 1
+	JSR SomeRoutine
+	Draw buffer 2
+The patch routine I used was a simple delay loop, redirected from those
+two JSRs:
+	LDX #$FF
+	CPX $D012
+	BNE *-5
+	DEX
+	CPX #$FC
+	BNE *-10
+	JMP SomeRoutine
+Of course, this will also slow the program down at 1Mhz; later on I became
+smarter about my patches, but this one works pretty well.
+	To save the game and patches, I simply froze it from AR.  Just
+saving from the monitor generally failed; the initialization routine
+doesn't initialize all I/O settings.  Part of the freezing process
+involves RLE compression, so if you freeze it is a good idea to
+fill all unused portions of memory -- temporary areas, bitmaps, etc.
+Another thing to do is to set a freeze point at the init routine,
+and then JMP there from the monitor.  By clearing the screen, you
+won't have to look at all the usual freezer ugliness, and at this
+point freezing isn't any different than saving from the ML monitor
+and RLE-packing the file.  Once saved, I tested a few times from the
+side, to make sure things worked right.
+	Whether freezing or saving from the monitor, if the file size
+is larger than 202 blocks or so, it can't be loaded on the SCPU without
+a special loader -- unless you compress it first.  I naturally recommend
+using pu-crunch for that purpose, but if you want to do it on the 64
+then I recommend using ABCrunch, which works well with the SCPU and
+gives about as good compression as you can get without an REU.
+	The result was stuntcar-scpu, which is *awfully* fun when fixed.
+Rescue on Fractalus
+-------------------
+	Next on my list was Rescue on Fractalus, an older (and quite cool)
+Lucasfilm game that just didn't cut it in the 64 conversion, for a number
+of reasons (that perhaps could have been avoided).  There are at least two
+versions of the game, one of which doesn't even work on a 128 (good 'ol $D030),
+but I have the older version, which does work.
+	With a SuperCPU, though, there are a number of problems.  The display
+flickers terribly.  The gameplay is smooth and not at all too fast -- in fact,
+it is too slow.  Specifically, the velocities and turning rates and such do
+not give a convincing illusion of speed or excitement.  The game is copy-
+protected and uses a track-based fastloader, loaded from disk via B-E, which
+also saves the high scores to disk.  Clearly, this one is a bigger job: the
+display is too fast, the game constants need adjusting, and the highscore code
+needs to be replaced by some kernal calls.
+	The structure of this code is a little different.  The main loop
+handles the (double-buffered) display -- it does all the calculations and
+draws to the two buffers.  The multi-part interrupt loop does the rest --
+it swaps buffers, changes the display in different parts of the screen,
+reads the joystick, and performs the game updates which change your
+position and direction.  It also handles enemies such as saucers, but
+doesn't handle the bunkers which fire at you from the mountains (the main
+loop takes care of those).
+	What does all this mean?  First, that the game can be a good ten
+steps ahead of the screen, which makes things like targeting very
+difficult.  Second, that the bunkers almost never fire at you at 1MHz
+(they go crazy at 20).  Third, that things like velocity and turning
+rate are rather low, because advancing or turning too quickly would
+get the game way out of sync (unfortunately, they are still too fast
+for 1MHz, making targeting difficult and movement clunky).  On the
+other hand, having the movement in the interrupt is the reason that
+the game does not become unplayably fast at 20MHz, and means that
+something besides a delay loop is needed.
+	The interrupt swaps buffers, but the main loop draws them,
+and because it draws so quickly it can start clearing and drawing to
+the visible buffer.  To make sure this was what I was seeing, I reversed
+the buffer swap code in the interrupt, so that the drawing buffer was
+always on-screen.  Sure enough, that's what the 20Mhz version looked
+like.
+	It turned out to be pretty easy to force the main loop to wait
+on the interrupt.  Although I messed around (unsuccessfully) with
+intercepting the interrupt loop, the buffer swap code actually
+modifies a zero-page variable upon swapping.  So all the main loop
+has to do is wait on that variable before charging ahead.  I may have
+made it wait for two frames, because it made the game play a little
+better.
+	Now, how to find the velocity and turn code?  Well it takes
+a keypress to change the velocity, so by hunting for LDA $DC01, and
+tracing back, the routine can be found; at the very least the
+affected variables may be found, and hunted for.  For example, if
+the result is stored in $D0, then you can search for LDA $D0.  The
+point is to locate the keypress processing code.  From there, a little
+trial and error (setting freeze points and pressing the velocity key)
+locates the piece of code which deals with changing the velocity, and
+in particular which variable corresponds to velocity.  Finally, from
+there it just takes another hunt for LDA velocity, ADC velocity, etc.
+to figure out where the code for updating position and direction is.
+	In this case, I was pretty sure I had found it, as it went
+something like
+	LDA velocity
+	LSR
+	LSR
+	ADC #$20
+and this was added to the position.  To check that this was the code,
+I just changed the ADC, or removed an LSR, to see that the speed changed.
+The code for turning left and right and moving up and down was similar,
+and again after a little trial and error it was clear what code did
+what.  Again, it wasn't necessary to totally understand how these
+routines worked exactly -- just the general idea of them, in this case
+to see that a multiple of the velocity was used to change the position
+and orientation of the player.
+	So, to fix it up, I just changed that multiple -- probably I
+NOPed out an LSR above, to basically double the speed, and changed the
+turning rates similarly.  This took a little experimentation, as it
+not only needed to be playably fast, but also couldn't overflow at
+high speeds, etc.
+	But once that was working, all that remained was the highscore
+table.  Finding the table location was pretty easy -- I just got a high
+score, and while entering my name froze the program, and figured out
+what got stored where.  From there it was pretty easy to figure out
+what was saved to disk.  From the original loader, I also knew where
+the highscores needed to be loaded to initially (the highscore table
+gets copied around a lot -- it doesn't just stay at that one location).
+Figuring out the exact number of bytes to save took a little bit of
+effort (saving either too many or too few bytes screws it up), but
+from there it was clear what memory needed to be saved.
+	So all that remained was to add the usual SETLFS etc. kernal
+calls, right?  Wrong.  The program uses all the usual kernal variables
+(from $90-$C0) for its own purposes.  Also recall that I wanted the
+program to work with device 9, etc.  To get around this, I did two
+things.  First, when the program first starts, I save some of the
+relevant variables to an unused part of memory -- in particular, I
+save the current drive number.  Second, before saving the highscore
+file, I actually copy all zero page variables from $90-$C2 or so
+to a temporary location, and then copy them back after saving.
+That way there are no worries about altering important locations.
+	Finding memory for the load/save patch was easy -- I just used
+the area which was previously used for the fastload load/save code.
+There was enough for the necessary code as well as temporary space
+for saving the zero page variables.
+	Finally, I changed some text from Rescue on Fractalus to
+Behind Jaggi Lines, to distinguish it from the original, and that
+was that.  Works great!  And is now more playable and challenging;
+in short, more the game it always should have been.
+Stellar 7
+---------
+	Finally, I tried my hand at Stellar 7.  Stellar 7 had several
+problems.  At the main screen, a delay loop tosses you to the mission
+screen after a while, if no keys are pressed.  This is a software loop,
+and so passes very quickly.  The game itself is too fast, so some sort
+of delay is needed.  The mission display is also too fast, and has
+software delay loops, so that needs fixing.  Finally, the game uses
+kernal calls for loading and saving, but is attached to drive #8;
+also, my version was split into a bunch of files, and I wanted to
+cut the number of files down.
+	Well, by this time it was all pretty straightforward.  From
+the loader, it was easy to figure out which files went where.  The
+mission and main displays were loaded in when needed, and swapped
+into unused parts of memory when not, so I loaded them in and
+adjusted the swap variable accordingly -- this left just the highscore
+and seven level files.
+	Finding the delay loops was easy -- I just went to the relevant
+sections of code, froze, and took a look at the loops.  There were your
+basic
+:LOOP
+	LDA $D4		;Check for keypress
+	BMI :key
+	DEX
+	BNE :LOOP
+	DEY
+	BNE :LOOP
+	DEC counter
+	BNE :LOOP
+:key 	LDX #$00
+	...
+Luckily, all routines were pretty much the same as the above.  The
+interrupt routine is in the $0314 vector, and the same routine is
+used during gameplay.
+	So the patch is very easy at this point.  First, change the
+IRQ code which does a JMP $EA7B to JMP $CE00
+. CE00  $EE INC $CFFF
+. CE03  $4C JMP $EA7B
+To fix up the keypress routines, the idea is to change the LDA $D0
+into a JSR patch.  How to substitute 3 bytes for 2 bytes?  The
+trick is to place the LDX #$00 into the patch routine:
+. CE06  $20 JSR $CE15		;Wait for $CFFF
+. CE09  $A5 LDA $D4
+. CE0B  $10 BPL $CE11
+. CE0D  $A2 LDX #$00		;If key pressed, then LDX #$00
+. CE0F  $29 AND #$FF
+. CE11  $60 RTS
+The actual delay is accomplished by waiting on $CFFF:
+. CE15  $AD LDA $CFFF
+. CE18  $C9 CMP #$04
+. CE1A  $90 BCC $CE15
+. CE1C  $A9 LDA #$00
+. CE1E  $8D STA $CFFF
+. CE21  $60 RTS
+As you can see, I waited a (default) of 4 frames.  The patch in the
+game/mission rendering routine works similarly -- I just patched
+the rendering code to basically JSR $CE15.  I also decided to
+try something new: let the user be able to change that CMP #$04
+to make things faster or slower, to suit their tastes.  The keyscan
+values were pretty easy to figure out, so this just required a little
+patch to check for the "+" and "-" keys, and change location $CE19
+accordingly.
+	Well, that about sums it up.  Perhaps if you do some fixing,
+you might send me a little email describing your own experiences?
+.......
+....
+..
+.                                    C=H #17
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+An Optimizing Hybrid LZ77 RLE Data Compression Program, aka
+Improving Compression Ratio for Low-Resource Decompression
+===========================================================
+by Pasi Ojala <albert@cs.tut.fi>
+Short:
+Pucrunch is a Hybrid LZ77 and RLE compressor, uses an Elias Gamma Code for
+lengths, mixture of Gamma Code and linear for LZ77 offset, and ranked RLE
+bytes indexed by the same Gamma Code.  Uses no extra memory in decompression.
+--------------------------------------------------------------------------
+Introduction
+------------
+Since I started writing demos for the C64 in 1989 I have always wanted to
+program a compression program.  I had a lot of ideas but never had the
+time, urge or knowledge to create one.  In retrospect, most of the ideas I
+had then were simply bogus ("magic function theory" as Mark Nelson nicely
+puts it).  But years passed, I gathered more knowledge and finally got an
+irresistible urge to finally realize my dream.
+The nice thing about the delay is that I don't need to write the actual
+compression program to run on a C64 anymore.  I can write it in portable
+ANSI-C code and just program it to create files that would uncompress
+themselves when run on a C64.  Running the compression program outside of
+the target system provides at least the following advantages.
+  * I can use portable ANSI-C code.  The compression program can be
+    compiled to run on a Unix box, Amiga, PC etc.  And I have all the
+    tools to debug the program and gather profiling information to see
+    why it is so slow :-)
+  * The program runs much faster than on C64.  If it is still slow, there
+    is always multitasking to allow me to do something else while I'm
+    compressing a file.
+  * There is 'enough' memory available.  I can use all the memory I
+    possibly need and use every trick possible to increase the compression
+    ratio as long as the decompression remains possible on a C64.
+  * Large files can be compressed as easily as shorter files.  Most C64
+    compressors can't handle files larger than around 52-54 kilobytes
+    (210-220 disk blocks).
+  * Cross-development is easier because you don't have to transfer a
+    file into C64 just to compress it.
+Memory Refresh and Terms for Compression
+----------------------------------------
+Statistical compression
+       Uses the uneven probability distribution of the source symbols
+       to shorten the average code length.  Huffman code and arithmetic
+       code belong to this group.  By giving a short code to symbols
+       occurring most often, the number of bits needed to represent
+       the symbols decreases.  Think of the Morse code for example:  the
+       characters you need more often have shorter codes and it takes
+       less time to send the message.
+Dictionary compression
+       Replaces repeating strings in the source with shorter
+       representations.  These may be indices to an actual dictionary
+       (Lempel-Ziv 78) or pointers to previous occurrences (Lempel-Ziv
+).  As long as it takes fewer bits to represent the reference
+       than the string itself, we get compression.  LZ78 is a lot like
+       the way BASIC substitutes tokens for keywords:  one-byte tokens
+       expand to whole words like PRINT#.  LZ77 replaces repeated
+       strings with (length,offset) pairs, thus the string VICIICI can
+       be encoded as VICI(3,3) -- the repeated occurrence of the
+       string ICI is replaced by a reference.
+Run-length encoding
+       Replaces repeating symbols with a single occurrence of the
+       symbol and a repeat count.  For example assembly compilers have
+       a .zero keyword or equivalent to fill a number of bytes with
+       zero without needing to list them all in the source code.
+Variable-length code
+       Any code where the length of the code is not explicitly known
+       but changes depending on the bit values.  Some kind of end
+       marker or length count must be provided to make a code a prefix
+       code (uniquely decodable).  Compare with ASCII (or Latin-1) text,
+       where you know you get the next letter by reading a full byte from
+       the input.  A variable-length code requires you to read part of the
+       data to know how many bits to read next.
+Universal codes
+       Universal codes are used to encode integer numbers without the
+       need to know the maximum value.  Smaller integer values usually
+       get shorter codes.  Different universal codes are optimal for
+       different distributions of the values.  Universal codes include
+       Elias Gamma and Delta codes, Fibonacci code, and Golomb and
+       Rice codes.
+Lossless compression
+       Lossless compression algorithms are able to exactly reproduce
+       the original contents unlike lossy compression, which omits
+       details that are not important or perceivable by human sensory
+       system.  This article only talks about lossless compression.
+My goal in the pucrunch project was to create a compression system in which
+the decompressor would use minimal resources (both memory and processing
+power) and still have the best possible compression ratio.  A nice bonus
+would be if it outperformed every other compression program available.
+These understandingly opposite requirements (minimal resources and good
+compression ratio) rule out most of the state-of-the-art compression
+algorithms and in effect only leave RLE and LZ77 to be considered.  Another
+goal was to learn something about data compression and that goal at least
+has been satisfied.
+I started by developing a byte-aligned LZ77+RLE compressor/decompressor and
+then added a Huffman backend to it.  The Huffman tree took 384 bytes and
+the code that decoded the tree into an internal representation took 100
+bytes.  I found out that while the Huffman code gained about 8% in my
+-kilobyte test files, the gain was reduced to only about 3% after
+accounting the extra code and the Huffman tree.
+Then I started a more detailed analysis of the LZ77 offset and length
+values and the RLE values and concluded that I would get better compression
+by using a variable-length code.  I used a simple variable-length code and
+scratched the Huffman backend code, as it didn't increase the compression
+ratio anymore.  This version became pucrunch.
+Pucrunch does not use byte-aligned data, and is a bit slower than the
+byte-aligned version because of this, but is much faster than the original
+version with the Huffman backend attached.  And pucrunch still does very
+well compression-wise.  In fact, it does very well indeed, beating even
+LhA, Zip, and GZip in some cases.  But let's not get too much ahead of
+ourselves.
+To get an improvement to the compression ratio for LZ77, we have only a few
+options left.  We can improve on the encoding of literal bytes (bytes that
+are not compressed), we can reduce the number of literal bytes we need to
+encode, and shorten the encoding of RLE and LZ77.  In the algorithm
+presented here all these improvement areas are addressed both collectively
+(one change affects more than one area) and one at a time.
+. By using a variable-length code we can gain compression for even
+-byte LZ77 matches, which in turn reduces the number of literal
+    bytes we need to encode.  Most LZ77-variants require 3-byte matches
+    to get any compression because they use so many bits to identify
+    the length and offset values, thus making the code longer than the
+    original bytes would've taken.
+. By using a new literal byte tagging system which distinguishes
+    uncompressed and compressed data efficiently we can reduce number
+    of extra bits needed to make this distinction (the encoding
+    overhead for literal bytes).  This is especially important for
+    files that do not compress well.
+. By using RLE in addition to LZ77 we can shorten the encoding for
+    long byte run sequences and at the same time set a convenient
+    upper limit to LZ77 match length.  The upper limit performs two
+    functions:
+       + we only need to encode integers in a specific range
+       + we only need to search strings shorter than this limit (if we
+         find a string long enough, we can stop there)
+    Short byte runs are compressed either using RLE or LZ77, whichever
+    gets the best results.
+. By doing statistical compression (more frequent symbols get
+    shorter representations) on the RLE bytes (in this case symbol
+    ranking) we can gain compression for even 2-byte run lengths,
+    which in turn reduces the number of literal bytes we need to
+    encode.
+. By carefully selecting which string matches and/or run lengths to
+    use we can take advantage of the variable-length code.  It may be
+    advantageous to compress a string as two shorter matches instead
+    of one long match and a bunch of literal bytes, and it can be
+    better to compress a string as a literal byte and a long match
+    instead of two shorter matches.
+This document consists of several parts, which are:
+  * C64 Considerations - Some words about the target system
+  * Escape codes - A new tagging system for literal bytes
+  * File format - What are the primaries that are output
+  * Graph search - How to squeeze every byte out of this method
+  * String match - An evolution of how to speed up the LZ77 search
+  * Some results on the target system files
+  * Results on the Calgary Corpus Test Suite
+  * The Decompression routine - 6510 code with commentary
+--------------------------------------------------------------------------
+Commodore 64 Considerations
+---------------------------
+Our target environment (Commodore 64) imposes some restrictions which we
+have to take into consideration when designing the ideal compression
+system.  A system with a 1-MHz 3-register 8-bit processor and 64 kilobytes
+of memory certainly imposes a great challenge, and thus also a great sense
+of achievement for good results.
+First, we would like it to be able to decompress as big a program as
+possible.  This in turn requires that the decompression code is located in
+low memory (most programs that we want to compress start at address 2049)
+and is as short as possible.  Also, the decompression code must not use any
+extra memory or only very small amounts of it.  Extra care must be taken to
+make certain that the compressed data is not overwritten during the
+decompression before it has been read.
+Secondly, my number one personal requirement is that the basic end address
+must be correctly set by the decompressor so that the program can be
+optionally saved in uncompressed form after decompression (although the
+current decompression code requires that you say "clr" before saving).
+This also requires that the decompression code is system-friendly, i.e.
+does not change KERNAL or BASIC variables or other structures.  Also, the
+decompressor shouldn't rely on file size or load end address pointers,
+because these may be corrupted by e.g.  X-modem file transfer protocol
+(padding bytes may be added).
+When these requirements are combined, there is not much selection in where
+in the memory we can put the decompression code.  There are some locations
+among the first 256 addresses (zeropage) that can be used, the (currently)
+unused part of the processor stack (0x100..0x1ff), the system input buffer
+(0x200..0x258) and the tape I/O buffer plus some unused bytes (0x334-0x3ff).
+The screen memory (0x400..0x7ff) can also be used if necessary.  If we can
+do without the screen memory and the tape buffer, we can potentially
+decompress files that are located from 0x258 to 0xffff.
+The third major requirement is that the decompression should be relatively
+fast.  After 10 seconds the user begins to wonder if the program has crashed
+or if it is doing anything, even if there is some feedback like border color
+flashing.  This means that the arithmetic used should be mostly 8- or 9-bit
+(instead of full 16 bits) and there should be very little of it per each
+decompressed byte.  Processor- and memory-intensive algorithms like
+arithmetic coding and prediction by partial matching (PPM) are pretty much
+out of the question, and that is saying it mildly.  LZ77 seems the only
+practical alternative.  Still, run-length encoding handles long byte runs
+better than LZ77 and can have a bigger length limit.  If we can easily
+incorporate RLE and LZ77 into the same algorithm, we should get the best
+features from both.
+A part of the decompressor efficiency depends on the format of the
+compressed data.  Byte-aligned codes, where everything is aligned into byte
+boundaries, can be accessed very quickly; non-byte-aligned variable length
+codes are much slower to handle, but provide better compression.  Note that
+byte-aligned codes can still have other data sizes than 8.  For example you
+can use 4 bits for LZ77 length and 12 bits for LZ77 offset, which preserves
+the byte alignment.
+--------------------------------------------------------------------------
+The New Tagging System
+----------------------
+I call the different types of information my compression algorithm outputs
+primaries.  The primaries in this compression algorithm are:
+  * literal (uncompressed) bytes and escape sequences,
+  * LZ77 (length,offset)-pairs,
+  * RLE (length,byte)-pairs, and
+  * EOF (end of file marker).
+Literal bytes are those bytes that cannot be represented by shorter codes,
+unlike a part of previously seen data (LZ77), or a part of a longer
+sequence of the same byte (RLE).
+Most compression programs handle the selection between compressed data and
+literal bytes in a straightforward way by using a prefix bit.  If the bit
+is 0, the following data is a literal byte (uncompressed).  If the bit is
+, the following data is compressed.  However, this presents the problem
+that non-compressible data will be expanded from the original 8 bits to 9
+bits per byte, i.e. by 12.5 %.  If the data isn't very compressible, this
+overhead consumes all the little savings you may have had using LZ77 or
+RLE.
+Some other data compression algorithms use a value (using variable-length
+code) that indicates the number of literal bytes that follow, but this is
+really analogous to a prefix bit, because 1-byte uncompressed data is very
+common for modestly compressible files.  So, using a prefix bit may seem
+like a good idea, but we may be able to do even better.  Let's see what we
+can come up with.  My idea was to somehow use the data itself to mark
+compressed and uncompressed data and thus not need any prefix bits.
+Let's assume that 75% of the symbols generated are literal bytes.  In this
+case it seems viable to allocate shorter codes for literal bytes, because
+they are more common than compressed data.  This distribution (75% are
+literal bytes) suggests that we should use 2 bits to determine whether the
+data is compressed or a literal byte.  One of the combinations indicates
+compressed data, and three of the combinations indicate a literal byte.  At
+the same time those three values divide the literal bytes into three
+distinct groups.  But how do we make the connection between which of the
+three bit patters we have and what are the literal byte values?
+The simplest way is to use a direct mapping.  We use two bits (let them be
+the two most-significant bits) _from the literal bytes themselves_ to
+indicate compressed data.  This way no actual prefix bits are needed.  We
+maintain an escape code (which doesn't need to be static), which is
+compared to the bits, and if they match, compressed data follows.  If the
+bits do not match, the rest of the literal byte follows.  In this way the
+literal bytes do not expand at all if their most significant bits do not
+match the escape code, and fewer bits are needed to represent the literal
+bytes.
+Whenever those bits in a literal byte would match the escape code, an
+escape sequence is generated.  Otherwise we could not represent those
+literal bytes which actually start like the escape code (the top bits
+match).  This escape sequence contains the offending data and a new escape
+code.  This escape sequence looks like
+     # of escape bits    (escape code)
+                   (escape mode select)
+     # of escape bits    (new escape bits)
+-# of escape bits  (rest of the byte)
+     = 8 + 3 + # of escape bits
+     = 13 for 2-bit escapes, i.e. expands the literal byte by 5 bits.
+Read further to see how we can take advantage of the changing escape code.
+You may also remember that in the run-length encoding presented in the
+previous article two successive equal bytes are used to indicate compressed
+data (escape condition) and all other bytes are literal bytes.  A similar
+technique is used in some C64 packers (RLE) and crunchers (LZ77), the only
+difference is that the escape condition is indicated by a fixed byte value.
+My tag system is in fact an extension to this.  Instead of a full byte, I
+use only a few bits.
+We assumed an even distribution of the values and two escape bits, so 1/4
+of the values have the same two most significant bits as the escape code.
+I call this probability that a literal byte has to be escaped the "hit rate".
+Thus, literal bytes expand in average 25% of the time by 5 bits, making the
+average length 25% * 13 + 75% * 8 = 9.25.  Not surprising, this is longer
+than using one bit to tag the literal bytes.  However, there is one thing
+we haven't considered yet.  The escape sequence has the possibility to
+change the escape code.  Using this feature to its optimum (escape
+optimization), the average 25% hit rate becomes the -maximum- hit rate.
+Also, because the distribution of the literal byte values is seldom flat
+(some values are more common than others) and there is locality (different
+parts of the file only contain some of the possible values), from which we
+can also benefit, the actual hit rate is always much smaller than that.
+Empirical studies on some test files show that for 2-bit escape codes the
+actual realized hit rate is only 1.8-6.4%, while the theoretical maximum
+is the already mentioned 25%.
+Previously we assumed the distribution of 75% of literal bytes and 25% of
+compressed data (other primaries).  This prompted us to select 2 escape
+bits.  For other distributions (differently compressible files, not
+necessarily better or worse) some other number of escape bits may be more
+suitable.  The compressor tries different number of escape bits and select
+the value which gives the best overall results.  The following table
+summarizes the hit rates on the test files for different number of escape
+bits.
+-bit   2-bit   3-bit   4-bit   File
+.0%   25.0%   12.5%   6.250%  Maximum
+.3%    2.5%    0.3%   0.090%  ivanova.bin
+.5%    2.4%    0.8%   0.063%  sheridan.bin
+.7%    1.8%    0.2%   0.041%  delenn.bin
+.5%    6.4%    2.5%   0.712%  bs.bin
+.06    8.32    8.15   8.050   bits/Byte for bs.bin
+As can be seen from the table, the realized hit rates are dramatically
+smaller than the theoretical maximum values.  A thought might occur that we
+should always select 4-bit (or longer) escapes, because it reduces the hit
+rate and presents the minimum overhead for literal bytes.  Unfortunately
+increasing the number of escape bits also increases the code length of the
+compressed data.  So, it is a matter of finding the optimum setting.
+If there are very few literal bytes compared to other primaries, 1-bit
+escape or no escape at all gives very short codes to compressed data, but
+causes more literal bytes to be escaped, which means 4 bits extra for each
+escaped byte (with 1-bit escapes).  If the majority of primaries are
+literal bytes, for example a 6-bit escape code causes most of the literal
+bytes to be output as 8-bit codes (no expansion), but makes the other
+primaries 6 bits longer.  Currently the compressor automatically selects
+the best number of escape bits, but this can be overridden by the user with
+the -e option.
+The cases in the example with 1-bit escape code validates the original
+suggestion:  use a prefix bit.  A simple prefix bit would produce better
+results on three of the previous test files (although only slightly).  For
+delenn.bin (1 vs 0.828) the escape system works better.  On the other hand,
+-bit escape code is not selected for any of the files, because 2-bit
+escape gives better overall results.
+-Note:- for 7-bit ASCII text files, where the top bit is always 0 (like
+most of the Calgary Corpus files), the hit rate is 0% for even 1-bit
+escapes.  Thus, literal bytes do not expand at all.  This is equivalent to
+using a prefix bit and 7-bit literals, but does not need separate algorithm
+to detect and handle 7-bit literals.
+For Calgary Corpus files the number of tag bits per primary (counting the
+escape sequences and other overhead) ranges from as low as 0.46 (book1) to
+.07 (geo) and 1.09 (pic).  These two files (geo and pic) are the only ones
+in the suite where a simple prefix bit would be better than the escape
+system.  The average is 0.74 tag bits per primary.
+In Canterbury Corpus the tag bits per primary ranges from 0.44
+(plrabn12.txt) to 1.09 (ptt5), which is the only one above 0.85 (sum).
+The average is 0.61 tag bits per primary.
+--------------------------------------------------------------------------
+Primaries Used for Compression
+------------------------------
+The compressor uses the previously described escape-bit system while
+generating its output.  I call the different groups of bits that are
+generated primaries, whether it is the correct term or not.  You are
+welcome to suggest a better term for them.  The primaries in this
+compression algorithm are:  literal byte (and escape sequence), LZ77
+(length,offset)-pair, RLE (length, byte)-pair, and EOF (end of file
+marker).
+If the top bits of a literal byte do not match the escape code, the byte is
+output as-is.  If the bits match, an escape sequence is generated, with the
+new escape code.  Other primaries start with the escape code.
+The Elias Gamma Code is used extensively.  This code consists of two parts:
+a unary code (a one-bit preceded by zero-bits) and a binary code part.  The
+first part tells the decoder how many bits are used for the binary code
+part.  Being a universal code, it produces shorter codes for small integers
+and longer codes for larger integers.  Because we expect we need to encode
+a lot of small integers (there are more short string matches and shorter
+equal byte runs than long ones), this reduces the total number of bits
+needed.  See the previous article for a more in-depth delve into
+statistical compression and universal codes.  To understand this article,
+you only need to keep in mind that small integer value equals short code.
+The following discusses the encoding of the primaries.
+The most frequent compressed data is LZ77.  The length of the match is
+output in Elias Gamma code, with "0" meaning the length of 2, "100" length
+of 3, "101" length of 4 and so on.  If the length is not 2, a LZ77 offset
+value follows.  This offset takes 9 to 22 bits.  If the length is 2, the
+next bit defines whether this is LZ77 or RLE/Escape.  If the bit is 0, an
+-bit LZ77 offset value follows.  (Note that this restricts the offset for
+-byte matches to 1..256.) If the bit is 1, the next bit decides between
+escape (0) and RLE (1).
+The code for an escape sequence is thus e..e010n..ne....e, where E is the
+byte, and N is the new escape code.  Example:
+  * We are using 2-bit escapes
+  * The current escape code is "11"
+  * We need to encode a byte 0xca == 0b11001010
+  * The escape code and the byte high bits match (both are "11")
+  * We output the current escape code "11"
+  * We output the escaped identification "010"
+  * We output the new escape bits, for example "10" (depends on escape
+    optimization)
+  * We output the rest of the escaped byte "001010"
+  * So, we have output the string "1101010001010"
+When the decompressor receives this string of bits, it finds that the first
+two bits match with the escape code, it finds the escape identification
+("010") and then gets the new escape, the rest of the original byte and
+combines it with the old escape code to get a whole byte.
+The end of file condition is encoded to the LZ77 offset and the RLE is
+subdivided into long and short versions.  Read further, and you get a
+better idea about why this kind of encoding is selected.
+When I studied the distribution of the length values (LZ77 and short RLE
+lengths), I noticed that the smaller the value, the more occurrences.
+The following table shows an example of length value distribution.
+         LZLEN   S-RLE
+       1975     477
+-4     1480     330
+-8      492     166
+-16     125      57
+-32     31      33
+-64      8      15
+The first column gives a range of values.  The first entry has a single
+value (2), the second two values (3 and 4), and so on.  The second column
+shows how many times the different LZ77 match lengths are used, the last
+column shows how many times short RLE lengths are used.  The distribution
+of the values gives a hint of how to most efficiently encode the values.
+We can see from the table for example that values 2-4 are used 3455 times,
+while values 5-64 are used only 656 times.  The more common values need to
+get shorter codes, while the less-used ones can be longer.
+Because in each "magnitude" there are approximately half as many values
+than in the preceding one, it almost immediately occurred to me that the
+optimal way to encode the length values (decremented by one) is:
+  Value      Encoding       Range     Gained
+  0000000    not possible
+  0000001    0              1         -6 bits
+x    10x            2-3       -4 bits
+xx    110xx          4-7       -2 bits
+xxx    1110xxx        8-15      +0 bits
+xxxx    11110xxxx      16-31     +2 bits
+xxxxx    111110xxxxx    32-63     +4 bits
+xxxxxx    111111xxxxxx   64-127    +5 bits
+The first column gives the binary code of the original value (with x
+denoting 0 or 1, xx 0..3, xxx 0..7 and so on), the second column gives the
+encoding of the value(s).  The third column lists the original value range
+in decimal notation.
+The last column summarizes the difference between this code and a 7-bit
+binary code.  Using the previous encoding for the length distribution
+presented reduces the number of bits used compared to a direct binary
+representation considerably.  Later I found out that this encoding in fact
+is Elias Gamma Code, only the assignment of 0- and 1-bits in the prefix is
+reversed, and in this version the length is limited.  Currently the maximum
+value is selectable between 64 and 256.
+So, to recap, this version of the Gamma code can encode numbers from 1 to
+(1 to 127 in the example).  LZ77 and RLE lengths that are used start
+from 2, because that is the shortest length that gives us any compression.
+These length values are first decremented by one, thus length 2 becomes
+"0", and for example length 64 becomes "11111011111".
+The distribution of the LZ77 offset values (pointer to a previous
+occurrence of a string) is not at all similar to the length distribution.
+Admittedly, the distribution isn't exactly flat, but it also isn't as
+radical as the length value distribution either.  I decided to encode the
+lower 8 bits (automatically selected or user-selectable between 8 and 12
+bits in the current version) of the offset as-is (i.e.  binary code) and
+the upper part with my version of the Elias Gamma Code.  However, 2-byte
+matches always have an 8-bit offset value.  The reason for this is
+discussed shortly.
+Because the upper part can contain the value 0 (so that we can represent
+offsets from 0 to 255 with a 8-bit lower part), and the code can't directly
+represent zero, the upper part of the LZ77 offset is incremented by one
+before encoding (unlike the length values which are decremented by one).
+Also, one code is reserved for an end of file (EOF) symbol.  This restricts
+the offset value somewhat, but the loss in compression is negligible.
+With the previous encoding 2-byte LZ77 matches would only gain 4 bits (with
+-bit escapes) for each offset from 1 to 256, and 2 bits for each offset
+from 257 to 768.  In the first case 9 bits would be used to represent the
+offset (one bit for gamma code representing the high part 0, and 8 bits for
+the low part of the offset), in the latter case 11 bits are used, because
+each "magnitude" of values in the Gamma code consumes two more bits than
+the previous one.
+The first case (offset 1..256) is much more frequent than the second case,
+because it saves more bits, and also because the symbol source statistics
+(whatever they are) guarantee 2-byte matches in recent history (much better
+chance than for 3-byte matches, for example).  If we restrict the offset
+for a 2-byte LZ77 match to 8 bits (1..256), we don't lose so much
+compression at all, but instead we could shorten the code by one bit.  This
+one bit comes from the fact that before we had to use one bit to make the
+selection "8-bit or longer".  Because we only have "8-bit" now, we don't
+need that select bit anymore.
+Or, we can use that select bit to a new purpose to select whether this code
+really is LZ77 or something else.  Compared to the older encoding (which
+I'm not detailing here, for clarity's sake.  This is already much too
+complicated to follow, and only slightly easier to describe) the codes for
+escape sequence, RLE and End of File are still the same length, but the
+code for LZ77 has been shortened by one bit.  Because LZ77 is the most
+frequently used primary, this presents a saving that more than compensates
+for the loss of 2-byte LZ77 matches with offsets 257..768 (which we can no
+longer represent, because we fixed the offset for 2-byte matches to use
+exactly 8 bits).
+Run length encoding is also a bit revised.  I found out that a lot of bits
+can be gained by using the same length encoding for RLE as for LZ77.  On
+the other hand, we still should be able to represent long repeat counts as
+that's where RLE is most efficient.  I decided to split RLE into two modes:
+  * short RLE for short (e.g. 2..128) equal byte strings
+  * long RLE for long equal byte strings
+The Long RLE selection is encoded into the Short RLE code.  Short RLE only
+uses half of its coding space, i.e.  if the maximum value for the gamma
+code is 127, short RLE uses only values 1..63.  Larger values switches the
+decoder into Long RLE mode and more bits are read to complete the run
+length value.
+For further compression in RLE we rank all the used RLE bytes (the values
+that are repeated in RLE) in the decreasing probability order.  The values
+are put into a table, and only the table indices are output.  The indices
+are also encoded using a variable length code (the same gamma code,
+surprise..), which uses less bits for smaller integer values.  As there are
+more RLE's with smaller indices, the average code length decreases.  In
+decompression we simply get the gamma code value and then use the value as
+an index into the table to get the value to repeat.
+Instead of reserving full 256 bytes for the table we only put the top 31
+RLE bytes into the table.  Normally this is enough.  If there happens to be
+a byte run with a value not in the table we use a similar technique as for
+the short/long RLE selection.  If the table index is larger than 31, it
+means we don't have the value in the table.  We use the values 32..63 to
+select the 'escaped' mode and simultaneously send the 5 most significant
+bits of the value (there are 32 distinct values in the range 32..63).  The
+rest 3 bits of the byte are sent separately.
+If you are more than confused, forget everything I said in this chapter and
+look at the decompression pseudo-code later in this article.
+--------------------------------------------------------------------------
+Graph Search - Selecting Primaries
+----------------------------------
+In free-parse methods there are several ways to divide the file into parts,
+each of which is equally valid but not necessary equally efficient in terms
+of compression ratio.
+     "i just saw justin adjusting his sting"
+     "i just saw", (-9,4), "in ad", (-9,6), "g his", (-25,2), (-10,4)
+     "i just saw", (-9,4), "in ad", (-9,6), "g his ", (-10,5)
+The latter two lines show how the sentence could be encoded using literal
+bytes and (offset, length) pairs.  As you can see, we have two different
+encodings for a single string and they are both valid, i.e.  they will
+produce the same string after decompression.  This is what free-parse is:
+there are several possible ways to divide the input into parts.  If we are
+clever, we will of course select the encoding that produces the shortest
+compressed version.  But how do we find this shortest version?  How does
+the data compressor decide which primary to generate in each step?
+The most efficient way the file can be divided is determined by a sort of a
+graph-search algorithm, which finds the shortest possible route from the
+start of the file to the end of the file.  Well, actually the algorithm
+proceeds from the end of the file to the beginning for efficiency reasons,
+but the result is the same anyway:  the path that minimizes the bits
+emitted is determined and remembered.  If the parameters (number of escape
+bits or the variable length codes or their parameters) are changed, the
+graph search must be re-executed.
+     "i just saw justin adjusting his sting"
+                 \___/    \_____/    \_|___/
+       15        11 13
+                                      \____/
+Think of the string as separate characters.  You can jump to the next
+character by paying 8 bits to do so (not shown in the figure), unless the
+top bits of the character match with the escape code (in which case you
+need more bits to send the character "escaped").  If the history buffer
+contains a string that matches a string starting at the current character
+you can jump over the string by paying as many bits as representing the
+LZ77 (offset,length)-pair takes (including escape bits), in this example
+from 11 to 15 bits.  And the same applies if you have RLE starting at the
+character.  Then you just find the least-expensive way to get from the
+start of the file to the end and you have found the optimal encoding.  In
+this case the last characters " sting" would be optimally encoded with
+(literal " ") + 15("sting") = 23 instead of 11(" s") + 13("ting") = 24
+bits.
+The algorithm can be written either cleverly or not-so.  We can take a real
+short-cut compared to a full-blown graph search because we can/need to only
+go forwards in the file:  we can simply start from the end!  Our accounting
+information which is updated when we pass each location in the data
+consists of three values:
+. the minimum bits from this location to the end of file.
+. the mode (literal, LZ77 or RLE) to use to get that minimum
+. the "jump" length for LZ77 and RLE
+For each location we try to jump forward (to a location we already
+processed) one location, LZ77 match length locations (if a match exists),
+or RLE length locations (if equal bytes follow) and select the shortest
+route, update the tables accordingly.  In addition, if we have a LZ77 or
+RLE length of for example 18, we also check jumps 17, 16, 15, ...  This
+gives a little extra compression.  Because we are doing the "tree traverse"
+starting from the "leaves", we only need to visit/process each location
+once.  Nothing located after the current location can't change, so there is
+never any need to update a location.
+To be able to find the minimal path, the algorithm needs the length of the
+RLE (the number of the identical bytes following) and the maximum LZ77
+length/offset (an identical string appearing earlier in the file) for each
+byte/location in the file.  This is the most time-consuming -and-
+memory-consuming part of the compression.  I have used several methods to
+make the search faster.  See String Match Speedup later in this
+article.  Fortunately these searches can be done first, and the actual
+optimization can use the cached values.
+Then what is the rationale behind this optimization?  It works because you
+are not forced to take every compression opportunity, but select the best
+ones.  The compression community calls this "lazy coding" or "non-greedy"
+selection.  You may want to emit a literal byte even if there is a 2-byte
+LZ77 match, because in the next position in the file you may have a longer
+match.  This is actually more complicated than that, but just take my word
+for it that there is a difference.  Not a very big difference, and only
+significant for variable-length code, but it is there and I was after every
+last bit of compression, remember.
+Note that the decision-making between primaries is quite simple if a
+fixed-length code is used.  A one-step lookahead is enough to guarantee
+optimal parsing.  If there is a more advantageous match in the next
+location, we output a literal byte and that longer match instead of the
+shorter match.  I don't have time or space here to go very deeply on that,
+but the main reason is that in fixed-length code it doesn't matter whether
+you represent a part of data as two matches of lengths 2 and 8 or as
+matches of lengths 3 and 7 or as any other possible combination (if matches
+of those lengths exist).  This is not true for a variable-length code
+and/or a statistical compression backend.  Different match lengths and
+offsets no longer generate equal-length codes.
+Note also that most LZ77 compression algorithms need at least 3-byte match
+to break even, i.e.  not expanding the data.  This is not surprising when
+you stop to think about it.  To gain something from 2-byte matches you need
+to encode the LZ77 match into 15 bits.  This is very little.  A generic
+LZ77 compressor would use one bit to select between a literal and LZ77, 12
+bits for moderate offset, and you have 2 bits left for match length.  I
+imagine the rationale to exclude 2-byte matches also include "the potential
+savings percentage for 2-byte matches is insignificant".  Pucrunch gets
+around this by using the tag system and Elias Gamma Code, and does indeed
+gain bits from even 2-byte matches.
+After we have decided on what primaries to output, we still have to make
+sure we get the best results from the literal tag system.  Escape
+optimization handles this.  In this stage we know which parts of the data
+are emitted as literal bytes and we can select the minimal path from the
+first literal byte to the last in the same way we optimized the primaries.
+Literal bytes that match the escape code generate an escape sequence, thus
+using more bits than unescaped literal bytes, and we need to minimize these
+occurrences.
+For each literal byte there is a corresponding new escape code which
+minimizes the path to the end of the file.  If the literal byte's high bits
+match the current escape code, this new escape code is used next.  The
+escape optimization routine proceeds from the end of the file to the
+beginning like the graph search, but it proceeds linearly and is thus much
+faster.
+I already noted that the new literal byte tagging system exploits the
+locality in the literal byte values.  If there is no correlation between
+the bytes, the tagging system does not do well at all.  Most of the time,
+however, the system works very well, performing 50% better than the
+prefix-bit approach.
+The escape optimization routine is currently very fast.  A little
+algorithmic magic removed a lot of code from the original version.  A fast
+escape optimization routine is quite advantageous, because the number of
+escape bits can now vary from 0 (uncompressed bytes always escaped) to 8
+and we need to run the routine again if we change the number of escape bits
+used to select the optimal escape code changes.
+Because escaped literal bytes actually expand the data, we need a safety
+area, or otherwise the compressed data may get overwritten by the
+decompressed data before we have used it.  Some extra bytes need to be
+reserved for the end of file marker.  The compression routine finds out how
+many bytes we need for safety buffer by keeping track of the difference
+between input and output sizes while creating the compressed file.
+            $1000 .. $2000
+            |OOOOOOOO|           O=original file
+        $801 ..
+        |D|CCCCC|                C=compressed data (D=decompressor)
+ $f7..      $1000     $2010
+ |D|            |CCCCC|          Before decompression starts
+            ^    ^
+            W    R               W=write pointer, R=read pointer
+If the original file is located at $1000-$1fff, and the calculated safety
+area is 16 bytes, the compressed version will be copied by the
+decompression routine higher in memory so that the last byte is at $200f.
+In this way, the minimum amount of other memory is overwritten by the
+decompression.  If the safety are would exceed the top of memory, we need a
+wrap buffer.  This is handled automatically by the compressor.  The read
+pointer wraps from the end of memory to the wrap buffer, allowing the
+original file to extend up to the end of the memory, all the way to $ffff.
+You can get the compression program to tell you which memory areas it uses
+by specifying the "-s" option.  Normally the safety buffer needed is less
+than a dozen bytes.
+To sum things up, Pucrunch operates in several steps:
+. Find RLE and LZ77 data, pre-select RLE byte table
+. Graph search, i.e.  which primaries to use
+. Primaries/Literal bytes ratio decides how many escape bits to use
+. Escape optimization, which escape codes to use
+. Update RLE ranks and the RLE byte table
+. Determine the safety area size and output the file.
+--------------------------------------------------------------------------
+String Match Speedup
+--------------------
+To be able to select the most efficient combination of primaries we of
+course first need to find out what kind of primaries are available for
+selection.  If the file doesn't have repeated bytes, we can't use RLE.  If
+the file doesn't have repeating byte strings, we can't use LZ77.  This
+string matching is the most time-consuming operation in LZ77 compression
+simply because of the amount of the comparison operations needed.  Any
+improvement in the match algorithm can decrease the compression time
+considerably.  Pucrunch is a living proof on that.
+The RLE search is straightforward and fast:  loop from the current position
+(P) forwards in the file counting each step until a different-valued byte
+is found or the end of the file is reached.  This count can then be used as
+the RLE byte count (if the graph search decides to use RLE).  The code can
+also be optimized to initialize counts for all locations that belonged to
+the RLE, because by definition there are only one-valued bytes in each one.
+Let us mark the current file position by P.
+ unsigned char *a = indata + P, val = *a++;
+ int top = inlen - P;
+ int rlelen = 1;
+ /* Loop for the whole RLE */
+ while(rlelen<top && *a++ == val)
+     rlelen++;
+ for(i=0;i<rlelen-1;i++)
+     rle[P+i] = rlelen-i;
+With LZ77 we can't use the same technique as for RLE (i.e.  using the
+information about current match to skip subsequent file locations to speed
+up the search).  For LZ77 we need to find the longest possible, and
+-nearest- possible, string that matches the bytes starting from the current
+location.  The nearer the match, the less bits are needed to represent the
+offset from the current position.
+Naively, we could start comparing the strings starting from P-1 and P,
+remembering the length of the matching part and then doing the same at P-2
+and P, P-3 and P, ..  P-j and P (j is the maximum search offset).  The
+longest match and its location (offset from the current position) are then
+remembered and initialized.  If we find a match longer or equal than the
+maximum length we can actually use, we can stop the search there.  (The
+code used to represent the length values may have an upper limit.)
+This may be the first implementation that comes to your (and my) mind, and
+might not seem so bad at first.  In reality, it is a very slow way to do
+the search: the -Brute Force- method.  It could take somewhere about (n^3)
+byte compares to process a file of the length n (a mathematically inclined
+person would probably give a better estimate).  However, using the already
+determined RLE value to our advantage permits us to rule out the worst-case
+projection, which happens when all bytes are the same value.  We only
+search LZ77 matches if the current file position has shorter RLE sequence
+than the maximum LZ77 copy length.
+The first thing I did to improve the speed is to remember the position
+where each byte has last been seen.  A simple 256-entry table handles that.
+Using this table, the search can directly start from the first potential
+match, and we don't need to search for it byte-by-byte anymore.  The table
+is continually updated when we move toward to the end of the file.
+That didn't give much of an improvement, but then I increased the table to
+*256 entries, making it possible to locate the latest occurrence of any
+byte -pair- instead.  The table indexed with the byte values and the table
+contents directly gives the position in file where these two bytes were
+last seen.  Because the shortest possible string that would offer any
+compression (for my encoding of LZ77) is two bytes long, this byte-pair
+history is very suitable indeed.  Also, the first (shortest possible, i.e.
+-byte) match is found directly from the byte-pair history.  This gave a
+moderate 30% decrease in compression time for one of my test files (from 28
+minutes to 17 minutes on a 25 MHz 68030).
+The second idea was to quickly discard the strings that had no chance of
+being longer matches than the one already found.  A one-byte hash value
+(sort of a checksum here, it is never used to index a hash table in this
+algorithm, but I rather use "hash value" than "checksum") is calculated
+from each three bytes of data.  The values are calculated once and put into
+a table, so we only need two memory fetches to know if two 3-byte strings
+are different.  If the hash values are different, at least one of the data
+bytes differ.  If the hash values are equal, we have to compare the
+original bytes.  The hash values of the strategic positions of the strings
+to compare are then ..  compared.  This strategic position is the location
+two bytes earlier than the longest match so far.  If the hash values
+differ, there is no chance that the match is longer than the current one.
+It may be not even be as long, because one of the two earlier bytes may be
+different.  If the hash values are equal, the brute-force byte-by-byte
+compare has to be done.  However, the hash value check already discards a
+huge number of candidates and more than generously pays back its own memory
+references.  Using the hash values the compression time shortens by 50%
+(from 17 minutes to 8 minutes).
+Okay, the byte-pair table tells us where the latest occurrence of any byte
+pair is located.  Still, for the latest occurrence before -that- one we
+have to do a brute force search.  The next improvement was to use the
+byte-pair table to generate a linked list of the byte pairs with the same
+value.  In fact, this linked list can be trivially represented as a table,
+using the same indexing as the file positions.  To locate the previous
+occurrence of a 2-byte string starting at location P, look at backSkip[P].
+ /* Update the two-byte history & backSkip */
+ if(P+1<inlen)
+ {
+     int index = (indata[P]<<8) | indata[P+1];
+     backSkip[P] = lastPair[index];
+     lastPair[index] = P+1;
+ }
+Actually the values in the table are one bigger than the real table
+indices.  This is because the values are of type unsigned short (can only
+represent non-negative values), and I wanted zero to mean "not occurred".
+This table makes the search of the next (previous) location to consider
+much faster, because it is a single table reference.  The compression time
+was reduced from 6 minutes to 1 minute 10 seconds.  Quite an improvement
+from the original 28 minutes!
+     backSkip[]   lastPair[]
+ ___  _______  ____
+    \/       \/    \
+ ...JOVE.....JOKA..JOKER
+    ^        ^     ^
+    |        |     |
+    next     |     position
+             current match (3)
+    C        B     A
+In this example we are looking at the string "JOKER" at location A.  Using
+the lastPair[] table (with the index "JO", the byte values at the current
+location A) we can jump directly to the latest match at B, which is "JO", 2
+bytes long.  The hash values for the string at B ("JOK") and at A ("JOK")
+are compared.  Because they are equal, we have a potential longer match (3
+bytes), and the strings "JOKE.." and "JOKA.." are compared.  A match of
+length 3 is found (the 4th byte differs).  The backSkip[] table with the
+index B gives the previous location where the 2-byte string "JO" can be
+found, i.e.  C.  The hash value for the strategic position of the string in
+the current position A ("OKE") is then compared to the hash value of the
+corresponding position in the next potential match starting at C ("OVE").
+They don't match, so the string starting at C ("JOVE..") can't include a
+longer match than the current longest match at B.
+There is also another trick that takes advantage of the already determined
+RLE lengths.  If the RLE lengths for the positions to compare don't match,
+we can directly skip to the next potential match.  Note that the RLE bytes
+(the data bytes) are the same, and need not be compared, because the first
+byte (two bytes) are always equal on both positions (our backSkip[] table
+guarantees that).  The RLE length value can also be used to skip the start
+of the strings when comparing them.
+Another improvement to the search code made it dramatically faster than
+before on highly redundant files (such as pic from the Calgary Corpus
+Suite, which was the Achilles' heel until then).  Basically the new search
+method just skips over the RLE part (if any) in the search position and then
+checks if the located position has equal number (and value) of RLE bytes
+before it.
+backSkip[]      lastPair[]
+     _____  ________
+          \/        \
+       ...AB.....A..ABCD    rle[p] # of A's, B is something else
+          ^      ^  ^
+          |      |  |
+          i      p  p+rle[p]-1
+The algorithm searches for a two-byte string which starts at p + rle[p]-1,
+i.e.  the last rle byte ('A') and the non-matching one ('B').  When it
+finds such location (simple lastPair[] or backSkip[] lookup), it checks if
+the rle in the compare position (i-(rle[p]-1)) is long enough (i.e.  the
+same number of A's before the B in both places).  If there are, the normal
+hash value check is performed on the strings and if it succeeds, the
+brute-force byte-compare is done.
+The rationale behind this idea is quite simple.  There are dramatically
+less matches for "AB" than for "AA", so we get a huge speedup with this
+approach.  We are still guaranteed to find the most recent longest match
+there is.
+Note that a compression method similar to RLE can be realized using just
+LZ77.  You just emit the first byte as a literal byte, and output a LZ77
+code with offset 1 and the original RLE length minus 1.  You can thus
+consider RLE as a special case, which offers tighter encoding of the
+necessary information.  Also, as my LZ77 limits the copy size to 64/128/256
+bytes, a RLE version providing lengths up to 32 kilobytes is a big
+improvement, even if the code for it is somewhat longer.
+--------------------------------------------------------------------------
+The Decompression Routine
+-------------------------
+Any lossless compression program is totally useless unless there exists a
+decompression program which takes in the compressed file and -- using only
+that information -- generates the original file exactly.  In this case the
+decompression program must run on C64's 6510 microprocessor, which had its
+impact on the algorithm development also.  Regardless of the algorithm,
+there are several requirements that the decompression code must satisfy:
+. Correctness - the decompression must behave accurately to
+    guarantee lossless decompression
+. Memory usage - the less memory is used the better
+. Speed - fast decompression is preferred to slower one
+The latter two requirements can be and are complementary.  A somewhat
+faster decompression for the same algorithm is possible if more memory can
+be used (although in this case the difference is quite small).  In any case
+the correctness of the result is the most important thing.
+A short pseudo-code of the decompression algorithm follows before I go to
+the actual C64 decompression code.
+     copy the decompression code to low memory
+     copy the compressed data forward in memory so that it isn't
+       overwritten before we have read it (setup safety & wrap buffers)
+     setup source and destination pointers
+     initialize RLE byte code table, the number of escape bits etc.
+     set initial escape code
+     do forever
+         get the number of escape bits "bits"
+         if "bits" do not match with the escape code
+             read more bits to complete a byte and output it
+         else
+             get Elias Gamma Code "value" and add 1 to it
+             if "value" is 2
+                 get 1 bit
+                 if bit is 0
+                     it is 2-byte LZ77
+                     get 8 bits for "offset"
+                     copy 2 bytes from "offset" bytes before current
+                         output position into current output position
+                 else
+                     get 1 bit
+                     if bit is 0
+                         it is an escaped literal byte
+                         get new escape code
+                         get more bits to complete a byte with the
+                             current escape code and output it
+                         use the new escape code
+                     else
+                         it is RLE
+                         get Elias Gamma Code "length"
+                         if "length" larger or equal than half the maximum
+                             it is long RLE
+                             get more bits to complete a byte "lo"
+                             get Elias Gamma Code "hi", subtract 1
+                             combine "lo" and "hi" into "length"
+                         endif
+                         get Elias Gamma Code "index"
+                         if "index" is larger than 31
+                             get 3 more bits to complete "byte"
+                         else
+                             get "byte" from RLE byte code table from
+                                 index "index"
+                         endif
+                         copy "byte" to the output "length" times
+                     endif
+                 endif
+             else
+                 it is LZ77
+                 get Elias Gamma Code "hi" and subtract 1 from it
+                 if "hi" is the maximum value - 1
+                     end decompression and start program
+                 endif
+                 get 8..12 bits "lo" (depending on settings)
+                 combine "hi" and "lo" into "offset"
+                 copy "value" number of bytes from "offset" bytes before
+                     current output position into current output position
+             endif
+         endif
+     end do
+The following routine is the pucrunch decompression code.  The code runs on
+the C64 or C128's C64-mode and a modified version is used for Vic20 and
+C16/Plus4.  It can be compiled by at least DASM V2.12.04.  Note that the
+compressor automatically attaches this code to the packet and sets the
+different parameters to match the compressed data.  I will insert
+additional commentary between strategic code points in addition to the
+comments that are already in the code.
+Note that at this point it is only possible to make the decompression code
+shorter by removing features.  At least I think that it is now so.  If I'm
+wrong, feel free to point it out to me.  Tim Rogers <timr@eurodltd.co.uk>
+did manage to snip off 2 bytes, thanks!  However, there are some features
+you may consider unnecessary.  The code can be shortened by:
+  * No basic end address set: 8 bytes
+  * No 2 MHz mode set/reset: 6 bytes
+  * No wrap option: 12 bytes
+Actually, if the wrap option is not used, the compressor automatically
+selects the shorter decompression code (only for the C64 version).
+        processor 6502
+BASEND  EQU $2d         ; start of basic variables (updated at EOF)
+LZPOS   EQU $2d         ; temporary, BASEND *MUST* *BE* *UPDATED* at EOF
+bitstr  EQU $f7         ; Hint the value beforehand
+xstore  EQU $c3         ; tape load temp
+WRAPBUF EQU $004b       ; 'wrap' buffer, 22 bytes ($02a7 for 89 bytes)
+        ORG $0801
+        DC.B $0b,8,$ef,0        ; '239 SYS2061'
+        DC.B $9e,$32,$30,$36
+        DC.B $31,0,0,0
+        sei             ; disable interrupts
+        inc $d030       ; or "bit $d030" if 2MHz mode is not enabled
+        inc 1           ; Select ALL-RAM configuration
+        ldx #0          ;** parameter - # of overlap bytes-1 off $ffff
+overlap lda $aaaa,x     ;** parameter start of off-end bytes
+        sta WRAPBUF,x   ; Copy to wrap/safety buffer
+        dex
+        bpl overlap
+        ldx #block200-end-block200+1    ; $54   ($59 max)
+packlp  lda block200-1,x
+        sta block200--1,x
+        dex
+        bne packlp
+        ldx #block-stack-end-block-stack+1      ; $b3   (stack! ~$e8 max)
+packlp2 lda block-stack-1,x
+        dc.b $9d                ; sta $nnnn,x
+        dc.w block-stack--1     ; (ZP addressing only addresses ZP!)
+        dex
+        bne packlp2
+        ldy #$aa        ;** parameter SIZE high + 1 (max 255 extra bytes)
+cploop  dex             ; ldx #$ff on the first round
+        lda $aaaa,x     ;** parameter DATAEND-0x100
+        sta $ff00,x     ;** parameter ORIG LEN-0x100+ reserved bytes
+        txa             ;cpx #0
+        bne cploop
+        dec cploop+6
+        dec cploop+3
+        dey
+        bne cploop
+        jmp main
+The first part of the code contains a sys command for the basic
+interpreter, two loops that copy the decompression code to zeropage/stack
+($f7-$1aa) and to the system input buffer ($200-$253).  The latter code
+segment contains byte, bit and Gamma Code input routines and the RLE byte
+code table, the former code segment contains the rest.
+This code also copies the compressed data forward in memory so that it
+won't be overwritten by the decompressed data before we have had a change
+to read it.  The decompression starts at the beginning and proceeds upwards
+in both the compressed and decompressed data.  A safety area is calculated
+by the compression routine.  It finds out how many bytes we need for
+temporary data expansion, i.e.  for escaped bytes.  The wrap buffer is used
+for files that extend upto the end of memory, and would otherwise overwrite
+the compressed data with decompressed data before it has been read.
+This code fragment is not used during the decompression itself.  In fact
+the code will normally be overwritten when the actual decompression starts.
+The very start of the next code block is located inside the zero page and
+the rest fills the lowest portion of the microprocessor stack.  The zero
+page is used to make the references to different variables shorter and
+faster.  Also, the variables don't take extra code to initialize, because
+they are copied with the same copy loop as the rest of the code.
+block-stack
+#rorg $f7       ; $f7 - ~$1e0
+block-stack-
+bitstr  dc.b $80        ; ZP    $80 == Empty
+esc     dc.b $00        ; ** parameter (saves a byte when here)
+OUTPOS = *+1            ; ZP
+putch   sta $aaaa       ; ** parameter
+        inc OUTPOS      ; ZP
+        bne 0$          ; Note: beq 0$; rts; 0$: inc OUTPOS+1;rts would be
+; $0100                 ;       faster, but 1 byte longer
+        inc OUTPOS+1    ; ZP
+$      rts
+putch is the subroutine that is used to output the decompressed bytes.  In
+this case the bytes are written to memory.  Because the subroutine call
+itself takes 12 cycles (6 for jsr and another 6 for rts), and the routine
+is called a lot of times during the decompression, the routine itself
+should be as fast as possible.  This is achieved by removing the need to
+save any registers.  This is done by using an absolute addressing mode
+instead of indirect indexed or absolute indexed addressing (sta $aaaa
+instead of sta ($zz),y or sta $aa00,y).  With indexed addressing you would
+need to save+clear+restore the index register value in the routine.
+Further improvement in code size and execution speed is done by storing the
+instruction that does the absolute addressing to zero page.  When the
+memory address is incremented we can use zero-page addressing for it too.
+On the other hand, the most time is spent in the bit input routine so
+further optimization of this routine is not feasible.
+newesc  ldy esc         ; remember the old code (top bits for escaped byte)
+        ldx #2          ; ** PARAMETER
+        jsr getchkf     ; get & save the new escape code
+        sta esc
+        tya             ; pre-set the bits
+        ; Fall through and get the rest of the bits.
+noesc   ldx #6          ; ** PARAMETER
+        jsr getchkf
+        jsr putch       ; output the escaped/normal byte
+        ; Fall through and check the escape bits again
+main    ldy #0          ; Reset to a defined state
+        tya             ; A = 0
+        ldx #2          ; ** PARAMETER
+        jsr getchkf     ; X=2 -> X=0
+        cmp esc
+        bne noesc       ; Not the escape code -> get the rest of the byte
+        ; Fall through to packed code
+The decompression code is first entered in main.  It first clears the
+accumulator and the Y register and then gets the escape bits (if any are
+used) from the input stream.  If they don't match with the current escape
+code, we get more bits to complete a byte and then output the result.  If
+the escape bits match, we have to do further checks to see what to do.
+        jsr getval      ; X = 0
+        sta xstore      ; save the length for a later time
+        cmp #1          ; LEN == 2 ?
+        bne lz77        ; LEN != 2      -> LZ77
+        tya             ; A = 0
+        jsr get1bit     ; X = 0
+        lsr             ; bit -> C, A = 0
+        bcc lz77-2      ; A=0 -> LZPOS+1
+        ;***FALL THRU***
+We first get the Elias Gamma Code value (or actually my independently
+developed version).  If it says the LZ77 match length is greater than 2, it
+means a LZ77 code and we jump to the proper routine.  Remember that the
+lengths are decremented before encoding, so the code value 1 means the
+length is 2.  If the length is two, we get a bit to decide if we have LZ77
+or something else.  We have to clear the accumulator, because get1bit does
+not do that automatically.
+If the bit we got (shifted to carry to clear the accumulator) was zero, it
+is LZ77 with an 8-bit offset.  If the bit was one, we get another bit which
+decides between RLE and an escaped byte.  A zero-bit means an escaped byte
+and the routine that is called also changes the escape bits to a new value.
+A one-bit means either a short or long RLE.
+        ; e..e01
+        jsr get1bit     ; X = 0
+        lsr             ; bit -> C, A = 0
+        bcc newesc      ; e..e010
+        ;***FALL THRU***
+        ; e..e011
+srle    iny             ; Y is 1 bigger than MSB loops
+        jsr getval      ; Y is 1, get len, X = 0
+        sta xstore      ; Save length LSB
+        cmp #64         ; ** PARAMETER 63-64 -> C clear, 64-64 -> C set..
+        bcc chrcode     ; short RLE, get bytecode
+longrle ldx #2          ; ** PARAMETER  111111xxxxxx
+        jsr getbits     ; get 3/2/1 more bits to get a full byte, X = 0
+        sta xstore      ; Save length LSB
+        jsr getval      ; length MSB, X = 0
+        tay             ; Y is 1 bigger than MSB loops
+The short RLE only uses half (or actually 1 value less than a half) of the
+gamma code range.  Larger values switches us into long RLE mode.  Because
+there are several values, we already know some bits of the length value.
+Depending on the gamma code maximum value we need to get from one to three
+bits more to assemble a full byte, which is then used as the less
+significant part for the run length count.  The upper part is encoded using
+the same gamma code we are using everywhere.  This limits the run length to
+kilobytes for the smallest maximum value (-m5) and to the full 64
+kilobytes for the largest value (-m7).
+Additional compression for RLE is gained using a table for the 31
+top-ranking RLE bytes.  We get an index from the input.  If it is from 1 to
+, we use it to index the table.  If the value is larger, the lower 5 bits
+of the value gives us the 5 most significant bits of the byte to repeat.
+In this case we read 3 additional bits to complete the byte.
+chrcode jsr getval      ; Byte Code, X = 0
+        tax             ; this is executed most of the time anyway
+        lda table-1,x   ; Saves one jump if done here (loses one txa)
+        cpx #32         ; 31-32 -> C clear, 32-32 -> C set..
+        bcc 1$          ; 1..31 -> the byte to repeat is in A
+        ; Not ranks 1..31, -> 111110xxxxx (32..64), get byte..
+        txa             ; get back the value (5 valid bits)
+        jsr get3bit     ; get 3 more bits to get a full byte, X = 0
+$      ldx xstore      ; get length LSB
+        inx             ; adjust for cpx#$ff;bne -> bne
+dorle   jsr putch
+        dex
+        bne dorle       ; xstore 0..255 -> 1..256
+        deym
+        bne dorle       ; Y was 1 bigger than wanted originally
+mainbeq beq main        ; reverse condition -> jump always
+After deciding the repeat count and decoding the value to repeat we simply
+have to output the value enough times.  The X register holds the lower part
+and the Y register holds the upper part of the count.  The X register value
+is first incremented by one to change the code sequence dex ; cpx #$ff ;
+bne dorle into simply dex ; bne dorle.  This may seem strange, but it saves
+one byte in the decompression code and two clock cycles for each byte that
+is outputted.  It's almost a ten percent improvement.  :-)
+The next code fragment is the LZ77 decode routine and it is used in the
+file parts that do not have equal byte runs (and even in some that have).
+The routine simply gets an offset value and copies a sequence of bytes from
+the already decompressed portion to the current output position.
+lz77    jsr getval      ; X=0 -> X=0
+        cmp #127        ; ** PARAMETER  Clears carry (is maximum value)
+        beq eof         ; EOF
+        sbc #0          ; C is clear -> subtract 1  (1..126 -> 0..125)
+        ldx #0          ; ** PARAMETER (more bits to get)
+        jsr getchkf     ; clears Carry, X=0 -> X=0
+lz77-2  sta LZPOS+1     ; offset MSB
+        ldx #8
+        jsr getbits     ; clears Carry, X=8 -> X=0
+                        ; Note: Already eor:ed in the compressor..
+        ;eor #255       ; offset LSB 2's complement -1 (i.e. -X = ~X+1)
+        adc OUTPOS      ; -offset -1 + curpos (C is clear)
+        sta LZPOS
+        lda OUTPOS+1
+        sbc LZPOS+1     ; takes C into account
+        sta LZPOS+1     ; copy X+1 number of chars from LZPOS to OUTPOS
+        ;ldy #0         ; Y was 0 originally, we don't change it
+        ldx xstore      ; LZLEN
+        inx             ; adjust for cpx#$ff;bne -> bne
+lzloop  lda (LZPOS),y
+        jsr putch
+        iny             ; Y does not wrap because X=0..255 and Y initially 0
+        dex
+        bne lzloop      ; X loops, (256,1..255)
+        beq mainbeq     ; jump through another beq (-1 byte, +3 cycles)
+There are two entry-points to the LZ77 decode routine.  The first one
+(lz77) is for copy lengths bigger than 2.  The second entry point (lz77-2)
+is for the length of 2 (8-bit offset value).
+        ; EOF
+eof     lda #$37        ; ** could be a PARAMETER
+        sta 1
+        dec $d030       ; or "bit $d030" if 2MHz mode is not enabled
+        lda OUTPOS      ; Set the basic prg end address
+        sta BASEND
+        lda OUTPOS+1
+        sta BASEND+1
+        cli             ; ** could be a PARAMETER
+        jmp $aaaa       ; ** PARAMETER
+#rend
+block-stack-end
+Some kind of a end of file marker is necessary for all variable-length
+codes.  Otherwise we could not be certain when to stop decoding.  Sometimes
+the byte count of the original file is used instead, but here a special EOF
+condition is more convenient.  If the high part of a LZ77 offset is the
+maximum gamma code value, we have reached the end of file and must stop
+decoding.  The end of file code turns on BASIC and KERNEL, turns off 2 MHz
+mode (for C128) and updates the basic end addresses before allowing
+interrupts and jumping to the program start address.
+The next code fragment is put into the system input buffer.  The routines
+are for getting bits from the encoded message (getbits) and decoding the
+Elias Gamma Code (getval).  The table at the end contains the ranked RLE
+bytes.  The compressor automatically decreases the table size if not all of
+the values are used.
+block200
+#rorg   $200    ; $200-$258
+block200-
+getnew  pha             ; 1 Byte/3 cycles
+INPOS = *+1
+        lda $aaaa       ;** parameter
+        rol             ; Shift in C=1 (last bit marker)
+        sta bitstr      ; bitstr initial value = $80 == empty
+        inc INPOS       ; Does not change C!
+        bne 0$
+        inc INPOS+1     ; Does not change C!
+        bne 0$
+        ; This code does not change C!
+        lda #WRAPBUF    ; Wrap from $ffff->$0000 -> WRAPBUF
+        sta INPOS
+$      pla             ; 1 Byte/4 cycles
+        rts
+; getval : Gets a 'static huffman coded' value
+; ** Scratches X, returns the value in A **
+getval  inx             ; X must be 0 when called!
+        txa             ; set the top bit (value is 1..255)
+$      asl bitstr
+        bne 1$
+        jsr getnew
+$      bcc getchk      ; got 0-bit
+        inx
+        cpx #7          ; ** parameter
+        bne 0$
+        beq getchk      ; inverse condition -> jump always
+; getbits: Gets X bits from the stream
+; ** Scratches X, returns the value in A **
+get1bit inx             ;2
+getbits asl bitstr
+        bne 1$
+        jsr getnew
+$      rol             ;2
+getchk  dex             ;2              more bits to get ?
+getchkf bne getbits     ;2/3
+        clc             ;2              return carry cleared
+        rts             ;6+6
+table   dc.b 0,0,0,0,0,0,0
+        dc.b 0,0,0,0,0,0,0,0
+        dc.b 0,0,0,0,0,0,0,0
+        dc.b 0,0,0,0,0,0,0,0
+#rend
+block200-end
+--------------------------------------------------------------------------
+Target Application Compression Tests
+------------------------------------
+The following data compression tests are made on my four C64 test files:
+   bs.bin is a demo part, about 50% code and 50% graphics data
+   delenn.bin is a BFLI picture with a viewer, a lot of dithering
+   sheridan.bin is a BFLI picture with a viewer, dithering, black areas
+   ivanova.bin is a BFLI picture with a viewer, dithering, larger black areas
+   Packer                 Size     Left    Comment
+   ===============================================
+   bs.bin                 41537
+   -----------------------------------------------
+   ByteBonker 1.5         27326   65.8%    Mode 4
+   Cruelcrunch 2.2        27136   65.3%    Mode 1
+   The AB Cruncher        27020   65.1%
+   ByteBoiler (REU)       26745   64.4%
+   RLE + ByteBoiler (REU) 26654   64.2%
+   PuCrunch               26415   63.6%    -m5
+   ===============================================
+   delenn.bin             47105
+   -----------------------------------------------
+   The AB Cruncher        N/A       N/A    Crashes
+   ByteBonker 1.5         21029   44.6%    Mode 3
+   Cruelcrunch 2.2        20672   43.9%    Mode 1
+   ByteBoiler (REU)       20371   43.2%
+   RLE + ByteBoiler (REU) 19838   42.1%
+   PuCrunch               19734   41.9%    -p2
+   ===============================================
+   sheridan.bin           47105
+   -----------------------------------------------
+   ByteBonker 1.5         13661   29.0%    Mode 3
+   Cruelcrunch 2.2        13595   28.9%    Mode H
+   The AB Cruncher        13534   28.7%
+   ByteBoiler (REU)       13308   28.3%
+   PuCrunch               12526   26.6%    -p2
+   RLE + ByteBoiler (REU) 12478   26.5%
+   ===============================================
+   ivanova.bin            47105
+   -----------------------------------------------
+   ByteBonker 1.5         11016   23.4%    Mode 1
+   Cruelcrunch 2.2        10883   23.1%    Mode H
+   The AB Cruncher        10743   22.8%
+   ByteBoiler (REU)       10550   22.4%
+   PuCrunch                9844   20.9%    -p2
+   RLE + ByteBoiler (REU)  9813   20.8%
+   LhA                     9543   20.3%    Decompressor not included
+   gzip -9                 9474   20.1%    Decompressor not included
+   -----------------------------------------------
+--------------------------------------------------------------------------
+Calgary Corpus Suite
+--------------------
+The original compressor only allows files upto 63 kB.  To be able to
+compare my algorithm to others I modified the compressor to allow bigger
+files.  I then got some reference results using the Calgary Corpus test
+suite.
+Note that the decompression code is included in the compressed files,
+although it is not valid for files over 63k (compressed or uncompressed
+size).  About 34 bytes are decompression parameters, the rest (approx.  300
+bytes) is 6510 machine language.  Kolmogorov complexity, anyone ?:-)
+To tell you the truth, the results surprised me, because the compression
+algorithm -IS- developed for a very special case in mind.  It only has a
+fixed code for LZ77/RLE lengths, not even a static one (fixed != static !=
+adaptive)!  Also, it does not use arithmetic code (or Huffman) to compress
+the literal bytes.  Because most of the big files are ASCII text, this
+somewhat handicaps my compressor, although the new tagging system is very
+happy with 7-bit ASCII input.  Also, decompression is relatively fast, and
+uses no extra memory.
+I'm getting relatively near LhA, and shorter than LhA for 8 files (300-byte
+decompressor included!), and relatively near or shorter than LhA in other
+cases if the decompressor is removed.
+The table contains the file name (file), compression options (options), the
+original file size (in) and the compressed file size (out) in bytes,
+average number of bits used to encode one byte (b/B), remaining size
+(ratio) and the reduction (gained), and the time used for compression.  For
+comparison, the last three columns show the compressed sizes for LhA, Zip
+and GZip (with the -9 option), respectively.
+FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz
+Estimated decompression on a C64 (1MHz 6510) 6:47       LhA    Zip GZip-9
+file   options    in     out  b/B  ratio gained time    out    out    out
+=========================================================================
+bib    -p4    111261   35457 2.55 31.87% 68.13%  8.3  40740  35041  34900
+book1  -p4    768771  318919 3.32 41.49% 58.51% 65.1 339074 313352 312281
+book2  -p4    610856  208627 2.74 34.16% 65.84% 43.5 228442 206663 206158
+geo    -p2    102400   72812 5.69 71.11% 28.89% 11.4  68574  68471  68414
+news   -p3    377109  144566 3.07 38.34% 61.66% 15.2 155084 144817 144400
+obj1   -m6     21504   10750 4.00 50.00% 50.00%  0.1  10310  10300  10320
+obj2          246814   83046 2.70 33.65% 66.35% 13.5  84981  81608  81087
+paper1 -p2     53161   19536 2.94 36.75% 63.25%  1.5  19676  18552  18543
+paper2 -p3     82199   30676 2.99 37.32% 62.68%  4.3  32096  29728  29667
+paper3 -p2     46526   19234 3.31 41.35% 58.65%  1.4  18949  18072  18074
+paper4 -p1 -m5 13286    6095 3.68 45.88% 54.12%  0.2   5558   5511   5534
+paper5 -p1 -m5 11954    5494 3.68 45.96% 54.04%  0.1   4990   4970   4995
+paper6 -p2     38105   14159 2.98 37.16% 62.84%  0.8  13814  13207  13213
+pic    -p1    513216   57835 0.91 11.27% 88.73% 23.2  52221  56420  52381
+progc  -p1     39611   14221 2.88 35.91% 64.09%  0.7  13941  13251  13261
+progl  -p1     71646   17038 1.91 23.79% 76.21%  3.8  16914  16249  16164
+progp          49379   11820 1.92 23.94% 76.06%  1.3  11507  11222  11186
+trans  -p2     93695   19511 1.67 20.83% 79.17%  3.7  22578  18961  18862
+-------------------------------------------------------------------------
+total        3251493 1089796 2.68 33.52% 66.48% 3:18
+Canterbury Corpus Suite
+-----------------------
+The following shows the results on the Canterbury corpus.  Again, I am
+quite pleased with the results.  For example, pucrunch beats GZip -9 for
+lcet10.txt if you remove the decompression code.
+FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz
+Estimated decompression on a C64 (1MHz 6510) 6:00        LhA    Zip GZip-9
+file        opt     in    out  b/B  ratio gained time    out    out    out
+==========================================================================
+alice29.txt -p4 152089  55103 2.90 36.24% 63.76% 11.3  59160  54525  54191
+ptt5        -p1 513216  57835 0.91 11.27% 88.73% 23.2  52272  56526  52382
+fields.c         11150   3505 2.52 31.44% 68.56%  0.1   3180   3230   3136
+kennedy.xls    1029744 265887 2.07 25.83% 74.17%  571 198354 206869 209733
+sum              38240  13334 2.79 34.87% 65.13%  0.6  14016  13006  12772
+lcet10.txt  -p4 426754 144585 2.72 33.89% 66.11% 30.8 159689 144974 144429
+plrabn12.txt-p4 481861 199134 3.31 41.33% 58.67% 43.6 210132 195299 194277
+cp.html     -p1  24603   8679 2.83 35.28% 64.72%  0.4   8402   8085   7981
+grammar.lsp -m5   3721   1591 3.43 42.76% 57.24%  0.0   1280   1336   1246
+xargs.1     -m5   4227   2117 4.01 50.09% 49.91%  0.0   1790   1842   1756
+asyoulik.txt-p4 125179  50594 3.24 40.42% 59.58%  7.5  52377  49042  48829
+--------------------------------------------------------------------------
+total          2810784 802364 2.28 28.55% 71.45% 11:29
+--------------------------------------------------------------------------
+Conclusions
+-----------
+In this article I have presented a compression program which creates
+compressed executable files for C64, VIC20 and Plus4/C16.  The compression
+can be performed on Amiga, MS-DOS/Win machine or any other machine with a
+C-compiler.  A powerful machine allows asymmetric compression:  a lot of
+resources can be used to compress the data while needing minimal resources
+for decompression. This was one of the design requirements.
+Two original ideas were presented:  a new literal byte tagging system and
+an algorithm using hybrid RLE and LZ77.  Also, a detailed explanation of
+the LZ77 string match routine and the optima parsing scheme was presented.
+The compression ratio and decompression speed is comparable to other
+compression programs for Commodore 8-bit computers.
+But what are then the real advantages of pucrunch compared to traditional
+C64 compression programs in addition to that you can now compress VIC20 and
+Plus4/C16 programs?  Because I'm lousy at praising my own work, I let you
+see some actual user comments.  I have edited the correspondence a little,
+but I hope he doesn't mind.  My comments are marked with an asterisk.
+Maybe Steve has something to add also?
+---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<---
+A big advantage is that pucrunch does RLE and LZ in one pass.  For demos I
+only used a cruncher and did my own RLE routines as it is somewhat annoying
+to use an external program for this.  These programs require some memory
+and ZP-addresses like the cruncher does.  So it can easily happen that the
+decruncher or depacker interfere with your demo-part, if you didn't know
+what memory is used by the depacker.  At least you have more restrictions
+to care about.  With pucrunch you can do RLE and LZ without having too much
+of these restrictions.
+* Right, and because pucrunch is designed that way from the start, it can
+* get better results with one-pass RLE and LZ than doing them separately.
+* On the other hand it more or less requires that you _don't_ RLE-pack the
+* file first..
+This is true, we also found that out.  We did a part for our demo which had
+some tables using only the low-nybble.  Also the bitmap had to be filled
+with a specific pattern.  We did some small routines to shorten the part,
+but as we tried pucrunch, this became obsolete.  From 59xxx bytes to 12xxx
+or 50 blocks, with our own RLE and a different cruncher we got 60 blks!
+Not bad at all ;)
+Not to mention that you have the complete and commented source-code for the
+decruncher, so that you can easily change it to your own needs.  And it's
+not only very flexible, it is also very powerful.  In general pucrunch does
+a better job than ByteBoiler+Sledgehammer.
+In addition to that pucrunch is of course much faster than crunchers on my
+C64, this has not only to do with my 486/66 and the use of an HDD.  See, I
+use a cross-assembler-system, and with pucrunch I don't have to transfer
+the assembled code to my 64, crunch it, and transfer it back to my pc.
+Now, it's just a simple command-line and here we go...  And not only I can
+do this, my friend who has an amiga uses pucrunch as well.  This is the
+first time we use the same cruncher, since I used to take ByteBoiler, but
+my friend didn't have a REU so he had to try another cruncher.
+So, if I try to make a conclusion:  It's fast, powerful and extremly
+flexible (thanks to the source-code).
+---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<---
+Just for your info...
+We won the demo-competition at the Interjam'98 and everything that was
+crunched ran through the hands of pucrunch...  Of course, you have been
+mentioned in the credits.  If you want to take a look, search for
+KNOOPS/DREAMS, which should be on the ftp-servers in some time.
+So, again, THANKS!  :)
+       Ninja/DREAMS
+---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<---
+So, what can I possibly hope to add to that, right?:-)
+If you have any comments, questions, article suggestions or just a general
+hello brewing in your mind, send me mail or visit my homepage.
+See you all again in the next issue!
+-Pasi
+--------------------------------------------------------------------------
+Appendix: The Log Book
+----------------------
+.3.1997
+       Tried reverse LZ, i.e.  mirrored history buffer.  Gained some
+       bytes, but its not really worth it, i.e.  the compress time
+       increases hugely and the decompressor gets bigger.
+.3.1997
+       Tried to have a code to use the last LZ copy position (offset
+       added to the lastly used LZ copy position).  On bs.run I gained
+bytes, but in fact the net gain was only 2 bytes
+       (decompressor becomes ~25 bytes longer, and the lengthening of
+       the long rle codes takes away the rest 30).
+.3.1997
+       Discovered that my representation of integers 1-63 is in fact
+       an Elias Gamma Code.  Tried Fibonacci code instead, but it was
+       much worse (~500 bytes on bs.run, ~300 bytes on delenn.run)
+       without even counting the expansion of the decompression code.
+.3.1997
+       'huffman' coded RLE byte -> ~70 bytes gain for bs.run.  The RLE
+       bytes used are ranked, and top 15 are put into a table, which
+       is indexed by a Elias Gamma Code.  Other RLE bytes get a prefix
+       "1111".
+.3.1997
+       The number of escape bits used is again selectable.  Using only
+       one escape bit for delenn.run gains ~150 bytes.  If #-option is
+       not selected, automatically selects the number of escape bits
+       (is a bit slow).
+.3.1997
+       Changed some arrays to short.  17 x inlen + 64kB memory used.
+       opt-escape() only needs two 16-element arrays now and is
+       slightly faster.
+.3.1997
+       Tried to use BASIC ROM as a codebook, but the results were not
+       so good.  For mostly-graphics files there are no long matches ->
+       no net gain, for mostly-code files the file itself gives a
+       better codebook..  Not to mention that using the BASIC ROM as a
+       codebook is not 100% compatible.
+.4.1997
+       Tried maxlen 128, but it only gained 17 bytes on ivanova.run,
+       and lost ~15 byte on bs.run.  This also increased the LZPOS
+       maximum value from ~16k to ~32k, but it also had little effect.
+.4.1997
+       Changed to coding so that LZ77 has the priority.  2-byte LZ
+       matches are coded in a special way without big loss in
+       efficiency, and codes also RLE/Escape.
+.4.1997
+       Tried histogram normalization on LZLEN, but it really did not
+       gain much of anything, not even counting the mapping table from
+       index to value that is needed.
+.4.1997
+..14 bit LZPOS base part.  Automatic selection.  Some more bytes
+       are gained if the proper selection is done before the LZ/RLELEN
+       optimization.  However, it can't really be done automatically
+       before that, because it is a recursive process and the original
+       LZ/RLE lengths are lost in the first optimization..
+.4.1997
+       Found a way to speed up the almost pathological cases by using
+       the RLE table to skip the matching beginnings.
+.5.1997
+       Switched to maximum length of 128 to get better results on the
+       Calgary Corpus test suite.
+.5.1997
+       Made the maximum length adjustable.  -m5, -m6, and -m7 select
+, 128 and 256 respectively.  The decompression code now allows
+       escape bits from 0 to 8.
+.6.1997
+       Optimized the escape optimization routine.  It now takes almost
+       no time at all.  It used a whole lot of time on large escape bit
+       values before.  The speedup came from a couple of generic data
+       structure optimizations and loop removals by informal
+       deductions.
+.6.1997
+       Figured out another, better way to speed up the pathological
+       cases.  Reduced the run time to a fraction of the original time.
+       All 64k files are compressed under one minute on my 25 MHz
+.  pic from the Calgary Corpus Suite is now compressed in
+seconds instead of 7 minutes (200 MHz Pentium w/ FreeBSD).
+       Compression of ivanova.run (one of my problem cases) was
+       reduced from about 15 minutes to 47 seconds.  The compression of
+       bs.run has been reduced from 28 minutes (the first version) to
+seconds.  An excellent example of how the changes in the
+       algorithm level gives the most impressive speedups.
+.6.1997
+       Changed the command line switches to use the standard approach.
+.6.1997
+       Now determines the number of bytes needed for temporary data
+       expansion (i.e.  escaped bytes).  Warns if there is not enough
+       memory to allow successful decompression on a C64.
+       Also, now it's possible to decompress the files compressed with
+       the program (must be the same version).  (-u)
+.6.1997
+       Only checks the lengths that are power of two's in
+       OptimizeLength(), because it does not seem to be any (much)
+       worse than checking every length.  (Smaller than found maximum
+       lengths are checked because they may result in a shorter file.)
+       This version (compiled with optimizations on) only spends 27
+       seconds on ivanova.run.
+.6.1997
+       Removed 4 bytes from the decrunch code (begins to be quite
+       tight now unless some features are removed) and simultaneously
+       removed a not-yet-occurred hidden bug.
+.6.1997
+       Checked the theoretical gain from using the lastly outputted
+       byte (conditional probabilities) to set the probabilities for
+       normal/LZ77/RLE selection.  The number of bits needed to code
+       the selection is from 0.0 to 1.58, but even using arithmetic
+       code to encode it, the original escape system is only 82 bits
+       worse (ivanova.run), 7881/7963 bits total.  The former figure is
+       calculated from the entropy, the latter includes
+       LZ77/RLE/escape select bits and actual escapes.
+.7.1997
+       In LZ77 match we now check if a longer match (further away)
+       really gains more bits.  Increase in match length can make the
+       code 2 bits longer.  Increase in match offset can make the code
+       even longer (2 bits for each magnitude).  Also, if LZPOS low
+       part is longer than 8, the extra bits make the code longer if
+       the length becomes longer than two.
+       ivanova -5 bytes, sheridan -14, delenn -26, bs -29
+       When generating the output rescans the LZ77 matches.  This is
+       because the optimization can shorten the matches and a shorter
+       match may be found much nearer than the original longer match.
+       Because longer offsets usually use more bits than shorter ones,
+       we get some bits off for each match of this kind.  Actually, the
+       rescan should be done in OptimizeLength() to get the most out
+       of it, but it is too much work right now (and would make the
+       optimize even slower).
+.8.1997
+bytes removed from the decrunch code.  I have to thank Tim
+       Rogers (timr@eurodltd.co.uk) for helping with 2 of them.
+.9.1997
+       Because SuperCPU doesn't work correctly with inc/dec $d030, I
+       made the 2 MHz user-selectable and off by default.  (-f)
+.9.1997
+       Today I found out that most of my fast string matching
+       algorithm matches the one developed by [Fenwick and Gutmann,
+]*.  It's quite frustrating to see that you are not a genius
+       after all and someone else has had the same idea.  :-) However,
+       using the RLE table to help still seems to be an original idea,
+       which helps immensely on the worst cases.  I still haven't read
+       their paper on this, so I'll just have to get it and see..
+       * [Fenwick and Gutmann, 1994].  P.M.  Fenwick and P.C.  Gutmann,
+       "Fast LZ77 String Matching", Dept of Computer Science, The
+       University of Auckland, Tech Report 102, Sep 1994
+.9.1997
+       The new decompression code can decompress files from $258 to
+       $ffff (or actually all the way upto $1002d :-).  The drawback
+       is:  the decompression code became 17 bytes longer.  However, the
+       old decompression code is used if the wrap option is not
+       needed.
+.9.1997
+       The backSkip table can now be fixed size (64 kWord) instead of
+       growing enormous for "BIG" files.  Unfortunately, if the
+       fixed-size table is used, the LZ77 rescan is impractical (well,
+       just a little slow, as we would need to recreate the backSkip
+       table again).  On the other hand the rescan did not gain so many
+       bytes in the first place (percentage).  The define BACKSKIP-FULL
+       enables the old behavior (default).  Note also, that for smaller
+       files than 64kB (the primary target files) the default consumes
+       less memory.
+       The hash value compare that is used to discard impossible
+       matches does not help much.  Although it halves the number of
+       strings to consider (compared to a direct one-byte compare),
+       speedwise the difference is negligible.  I suppose a mismatch is
+       found very quickly when the strings are compared starting from
+       the third charater (the two first characters are equal, because
+       we have a full hash table).  According to one test file, on
+       average 3.8 byte-compares are done for each potential match.  A
+       define HASH-COMPARE enables (default) the hash version of the
+       compare, in which case "inlen" bytes more memory is used.
+       After removing the hash compare my algorithm quite closely
+       follows the [Fenwick and Gutmann, 1994] fast string matching
+       algorithm (except the RLE trick).  (Although I *still* haven't
+       read it.)
+x inlen + 256 kB of memory is used (with no HASH-COMPARE and
+       without BACKSKIP-FULL).
+.9.1997
+       One byte removed from the decompression code (both versions).
+.12.1997
+       Only records longer matches if they compress better than
+       shorter ones.  I.e.  a match of length N at offset L can be
+       better than a match of length N+1 at 4*L.  The old comparison
+       was "better or equal" (">=").  The new comparison "better" (">")
+       gives better results on all Calgary Corpus files except "geo",
+       which loses 101 bytes (0.14% of the compressed size).
+       An extra check/rescan for 2-byte matches in OptimizeLength()
+       increased the compression ratio for "geo" considerably, back to
+       the original and better.  It seems to help for the other files
+       also.  Unfortunately this only works with the full backskip
+       table (BACKSKIP-FULL defined).
+.2.1998
+       Compression/Decompression for VIC20 and C16/+4 incorporated
+       into the same program.
+.3.1998
+       Removed two bytes from the decompression codes.
+.8.1998
+       There was a small bug in pucrunch which caused the location
+       $2c30 to be decremented (dec $2c30 instead of bit $d030) when
+       run without the -f option.  The source is fixed and executables
+       are now updated.
+--------------------------------------------------------------------------
+References
+. http://www.cs.tut.fi/~albert/
+. http://www.cs.tut.fi/~albert/Dev/
+. http://www.cs.tut.fi/~albert/Dev/pucrunch/
+.......
+....
+..
+.                                    C=H #17
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+VIC-20 Kernal ROM Disassembly Project
+Richard Cini
+Introduction
+	In order to put this project into perspective, a little personal
+history is needed. I received my first Commodore as a gift from my parents
+back in 1982. I used Commodore PETs in the school's computer lab, and
+Radio Shack Model I's in the local R/S store. It was nice to have one
+of my own to hack on, though.  Back then, most of my work was with the
+built-in BASIC interpreter. My claim to fame (to my family, at least) was
+a BASIC/machine language mailing list management program, and an allophone
+speech synthesizer hardware-software hack. Both worked well, which surprised
+my mother, who claims that nothing I ever built worked right.  As I grew-up,
+my computer-of-choice changed, but I never lost my love for the VIC. It is
+small, easy to program, has a very capable processor (by 1980's standards)
+and decent I/O capability. It's peripherals were varied, if not quirky
+(take the 1515 printer, for example), but at least everything worked well
+together.
+	Now, fast forward to 1994. Commodore International failed,
+crippled by years of a weak product strategy, squandered opportunities
+and the market's increased focus on mainstream PC-compatible or Macintosh
+machines as productivity tools. After Commodore's failure, I decided that
+I wanted to try to purchase the Commodore 8-bit intellectual property,
+including the rights to the Kernal and BASIC source code, primarily for
+preservation purposes. One of my hobbies is collecting and preserving
+obsolete and unsupported computers, accessories, documentation, etc.
+	Since Commodore's bankruptcy attorney would not return my calls
+(no surprise there), I embarked on decompiling the Kernal. This project,
+although time consuming and rewarding from an informational perspective
+was not truly trail-blazing. Many before me probably decompiled parts of
+the Kernal in order to gain some understanding as it related to another
+project. However, in my research, I don't recall ever seeing complete
+recompileable source code. Memory and ROM maps, yes; source code, no.
+	Marko Makela manages a great Commodore web site that contains lots
+of useful information, including these memory and ROM maps. These provided
+the starting point for my work. See http://www.hut.fi/misc/cbm/docs/ for
+this and much more information.  What I would like to accomplish in a
+series of articles is to explain the process and to discuss specific
+Kernal routines that may be of interest to C=Hacking readers. Also, where
+appropriate, I will make comparisons with the other Commodore machines
+that were the contemporaries of the VIC, the C64 and the PET, specifically.
+	One might ask, "Why is this project different from all of the other
+resources already available?" In short, here's why:
+.	The end result is a fully modifiable and compilable source file.
+.	Only the VIC memory map and ROM location map are available
+	*because* to date everyone has focused on the C64 as it is the
+	more functional (and hence, more popular) Commodore of the era.
+	The same goes for the PET, too.
+.	A far as I know, there was no "Mapping the VIC" written, because
+	of #2, above.
+.	Because of #3, the fact that I had some spare time, and that I
+	love my VIC (although I don't use it as much these days), I felt
+	that I had to do it.
+Brief VIC-20 History
+--------------------
+Although I'd like to provide a complete history lesson on the VIC-20, many
+others before me have done a better job. The March 1985 issue of the
+IEEE Spectrum has an article, as does Marko's web site. See
+http://www.hut.fi/misc/cbm/docs/peddle.english.html for a great article on
+Chuck Peddle, the well-known creator of the 6502 microprocessor. Nonetheless,
+I will provide a Readers' Digest version of the history of the VIC.
+	In the late-70s, engineers in the Advanced Systems Design Group
+(ASDG) at Commodore created a multi-function video/sound interface chip,
+the 6560 (a.k.a., the VIC). Al Charpentier ran the LSI section of the ASDG,
+and was the lead in designing the VIC-I, and later, the VIC-II. The ASDG was
+the old MOS Semiconductor operation that Commodore bought in the mid-70s.
+The 6560 supported complete composite color video, 3-voice plus white noise
+sound, a volume control, two analog-to-digital converters that supported
+the use of a game paddle or joystick, and a light pen interface.
+Feature-rich as it was, no manufacturer wanted to commit a product line to
+it. Since Commodore could not find anyone to buy the chip, they decided to
+build a computer that featured the chip. Hence, the VIC-20 was born. Al's
+buddy, Bob Yannes, was a senior systems designer at Commodore who developed
+the VIC-20 (and later, the C64) prototype. The VIC was Commodore's first
+color computer, and the first designed for home use.
+Relationships with other Commodore Products
+-------------------------------------------
+	The structure of the VIC-20 ROM parallels those of Commodore's
+other machines of that era, the PET and the C64. All three machines share
+the concept of a standard, public, API accessed through a jump table
+located in the last page of the system ROM.
+	There's also another striking similarity, which comes somewhat as
+a surprise to this writer, but intuitively makes sense. The C64 and VIC
+Kernal ROMs are nearly identically laid-out and contain a lot of common
+code. The C64 ROM of course contains certain modifications relating to
+its unique hardware and capabilities, but otherwise is the same. Since
+the machines are so similar in design (the same engineer designed them),
+the similarity of the code isn't surprising. This recycling appears to
+have been a cost-effective way to develop a new computer in record time.
+The Process
+-----------
+I believe that the reverse engineering process for the purpose of creating
+source code that produces binary-identical object code, is fairly standard:
+know thy hardware, get the object code, turn object code into assembly code,
+give everything names, test compile, fix errors, re-compile and call it done.
+However, I'm certain that C=Hacking readers have reverse engineering methods
+which differ from mine. I'd be interested in hearing them, as I'm always
+looking for a better way to do things.  I started with an image of the ROM
+from my VIC (although a ROM image from funet would work) and cranked it
+through a disassembler (I used SuperMon). Since I wasn't too up on
+transferring data from the VIC to the PC, I used a brute-force method:
+scanning disassembler listings into TIFF files and running them through
+an OCR program. This produced plain-text files for me to work with.  Then,
+I used various available information on the Web and in books (such as "VIC
+Revealed", "The Commodore Innerspace Anthology", "Mapping the 64")
+to break the code into subroutines. I inserted meaningful memory and
+program location labels, taking names from all of the above sources.
+My assembler allowed me to create conventional data and code "segments",
+so I took the time to create a "real" data segment that mirrored the
+VIC memory map. These segments are not to be confused with the segments
+supported under the various PeeCee memory models. I used TASM 3.1, a
+shareware 8-bit table-based assembler by Squak Valley Software of
+Issaquah, WA. TASM supports the 6502, 6800/05/11, 8048/51/85/96, Z80 and
+TMS7000/32010/32025 processors.
+Next, I read the code and added comments as I went along. I do this on a
+routine-by-routine basis, as time permits, but always beginning at the
+first instruction after POR. Finally, I did a recompilation and compared
+the recompiled output with the ROM image to make sure that no errors were
+introduced in the source code creation process. I wasn't so lucky
+the first time around :-). I like to make gross checks first: ROM image
+size, location of well-known routines (such as the jump table at $FF85,
+the POR vector at $FD22, and the individual jump table routines).
+This helps to narrow down the location of any errors.
+Once I was satisfied that the disassembly was right, I created ROM image to
+be burned into a test EPROM. The ROMs used in the VIC are 2364
+mask-programmed ROMs. 2364s are a 24-pin 8k x 8bit ROM device, and the
+closest commonly available EPROM is the 2764 EPROM, a 28-pin 8k x 8bit
+device. Since the 2364s have four fewer pins (reflecting its non-
+programmability), an adapter board needs to be built. A piece of perf board
+and two 28-pin wire-wrap DIP socket should do the trick.
+	If all checks-out, the source is then ready for modification.
+Issues and Considerations
+-------------------------
+	There is one important thing that I discovered while hacking the
+Kernal -- there is no wasted space in the Kernal ROM as it presently exists.
+Actually, the Kernal occupies 1,279 bytes less than 8k, beginning at
+$E500 (and consequently, the BASIC ROM overhangs the $E000 boundary by
+,279 bytes). The Kernal developers maximized the 8k space, so any Kernal
+hack will have to use jumps to a patch area elsewhere in the processor
+address space. For example, let's say that I wanted to add BASIC 4.0 disk
+commands to the VIC Kernal by hacking it (recognizing that I could have
+used the easier wedge method). I could place jumps in the Kernal ROM to
+locations within my own non-autostart ROM located at $A000.  The other
+important consideration is backward compatibility. Certain programs may rely
+on the specific location of code within the Kernal ROM. For example, a
+game ROM may make direct calls to the internal Plot routine, as opposed
+to using the jump table at the end of the Kernal ROM (saving a few
+processor ticks in the process). Shifting code around would relocate that
+code, breaking that program.
+General Hardware Information
+----------------------------
+The Microprocessor
+Manufactured by MOS and second-sourced from Rockwell Semiconductors, the stock
+is an 8-bit, 1MHz NMOS-process processor. It supports 56 instructions
+in 13 addressing modes (although six are combinations of the seven basic
+modes), three processor registers (.A, .X, and .Y), stack pointer, and a
+condition code (flags) register. The 6502 supports both maskable and
+non-maskable interrupts with fixed vectors at the top of its 64k address
+space.
+I/O Processors
+	VIC-20 I/O is managed by two memory-mapped I/O processors, the
+versatile interface adapter (VIA). The I/O region occupies the
+k-address space beginning at $9000.  The VIAs manage the keyboard,
+joystick, light pen, cassette deck, the IEEE serial interface and the
+user port. Each 6522 has 16 bi-directional I/O lines, four handshaking
+lines, two 8-bit shift registers and two clock generators (capable of
+generating free-running or triggered pulses). One of these clocks is
+responsible for RTC, RS-232, IEEE, and cassette tape timing. Of the eight
+handshaking lines, two are free for general use, while the other six are
+used for the IEEE serial port, the RESTORE key, and cassette control.
+-bits of the total 32-bits of I/O are used for keyboard scanning and
+joystick, light pen, and serial inputs. Truly available to the user is four
+handshaking lines and 8-bits of I/O.
+Video Interface
+The VIC-20 video interface is managed by the 6560/6561 VIC chip. The
+VIC screen is organized in 22 columns by 24 rows in text mode and
+by 192 pixels in "graphics" mode. Graphics mode is synthesized by
+mapping the character generator ROM to RAM and modifying the character
+glyphs.  The on-chip video sync generator is capable of generating video
+in NTSC (6560) or PAL (6561) formats in 16 colors. The VIC also contains
+three programmable tone generators, a white noise source, volume control,
+two A/D converters (for game paddle interfacing), a light pen input, screen
+centering, and independent control over background, foreground, and border
+colors.  The 6560/6561 performs its own DMA to separate 4-bit video RAM
+and 8-bit character generator ROM. The address buss is a private 2MHz buss,
+which is not shared with the 1MHz microprocessor buss. Shared ROM access
+is performed during the processor Phase 1 clock.
+Memory Map
+	Space limitations prohibit listing the complete VIC-20 memory map,
+but an abridged version may be helpful:
+HEX Offset	DESCRIPTION
+-00FF	Zero page: Kernal and BASIC system areas
+-01FF    	Page 1: tape error log area, processor stack
+-02FF    	Page 2: BASIC input buffer, file and device address tables,
+			keyboard and screen vars, RS232 vars
+-03FF    	Page 3: BASIC vectors, processor register storage, Kernal
+			vectors cassette buffer area
+-0FFF	Pages 4-15: 3k expansion area
+-1DFF	User Basic area (unexpanded VIC)
+E00-1FFF	Screen memory (unexpanded VIC)
+-3FFF	8K expansion RAM/ROM block 1
+-5FFF	8K expansion RAM/ROM block 2
+-7FFF	8K expansion RAM/ROM block 3
+NOTE: When additional memory is added to block 1, 2 or 3,
+the Kernal relocates the following things for BASIC:
+-11FF	Screen memory
+-?		User Basic area
+-95FF	Color RAM
+-8FFF	4K Character generator ROM
+-83FF	Upper case and graphics
+-87FF	Reversed upper case and graphics
+-8BFF	Upper and lower case
+C00-8FFF	Reversed upper and lower case
+-93FF	I/O Block 0
+-900F	VIC chip registers
+-911F	6522 VIA#1 registers
+-912F	6522 VIA#2 registers
+-95FF	location of COLOR RAM with additional RAM at blk 1
+-97FF	Normal location of COLOR RAM
+-9BFF	I/O Block 2
+C00-9FFF	I/O Block 3
+A000-BFFF	8K block for expansion ROM (autostart ROM)
+C000-DFFF	8K BASIC ROM
+E000-FFFF	8K Kernal ROM
+Kernal Functions
+----------------
+System Startup
+	Let's first take a look at how a VIC-20 boots. The process is
+substantially similar for the C64 and the PET (through the commonality
+of the microprocessor upon which each machine is based), although the
+locations of various routines differs, as does the memory and I/O map.
+When power is first applied to the microprocessor, the RESET pin is held
+low by a 555 timer for a period long enough for the power supply and clock
+generator to stabilize. The 6502 utilizes the last six bytes of the address
+space to store three critical vectors: the NMI, RESET, and IRQ vectors,
+respectively. On power-up, the PC (program counter) is initialized to the
+address stored at location $FFFC and execution begins at that location
+(the first column in the source code represents the program line number):
+   FFFA ;=================================================
+   FFFA ; - Power-on and hardware vectors
+   FFFA ;
+   FFFA A9 FE       	.dw NMI		;non-maskable interrupt
+   FFFC 22 FD       	.dw RESET	;POR
+   FFFE 72 FF       	.dw IRQ		;IRQ processor
+Execution begins at $FD22, the POR (power-on reset) vector:
+   FD22  ;#################################################
+   FD22  ; Power-on RESET entry
+   FD22  ;#################################################
+   FD22             RESET
+   FD22 A2 FF       	LDX #$FF
+   FD24 78          	SEI		;kill interrupts
+   FD25 9A          	TXS		;set stack top
+   FD26 D8          	CLD
+   FD27 20 3F FD    	JSR SCNROM	;check for autostart ROM
+   FD2A D0 03       	BNE SKIPA0	;not there, skip ROM init
+   FD2C
+   FD2C 6C 00 A0    	JMP (A0BASE)	;jump to ROM init if present
+   FD2F
+   FD2F             SKIPA0
+   FD2F 20 8D FD    	JSR RAMTAS	;test RAM
+   FD32 20 52 FD    	JSR IRESTR	;init work memory
+   FD35 20 F9 FD    	JSR IOINIT	;setup hardware
+   FD38 20 18 E5    	JSR CINT1	;init video
+   FD3B 58          	CLI		;re-enable interrupts
+   FD3C 6C 00 C0    	JMP (BENTER)	;enter BASIC
+	The startup routines setup the processor stack, check for the
+existence of an autostart ROM. Autostart ROMs are located in the $A000
+block and have a five-byte signature (A0CBM) at offset $04. If the
+signature is found, the Kernal jumps to the A0ROM initialization routine
+pointed to by offset $00 of the autostart ROM.
+	If no signature is found, the Kernal initialization continues by
+testing the RAM, initializing the system variables, system hardware, and
+the screen. Finally, the Kernal transfers control to the BASIC
+initialization entry point at $C000.
+Routine SCNROM
+	The first routine called from the POR code is the SCNROM routine.
+This routine looks for the special 5-byte signature that indicates the
+presence of an autostart ROM located in the $A segment.
+   FD3F  ;====================================================
+   FD3F  ; SCNROM - Scan ROM areas for Autostart ROM signature
+   FD3F  ;
+   FD3F             SCNROM
+   FD3F A2 05       	LDX #$05	;5 chars to compare
+   FD41
+   FD41             SCNLOOP
+   FD41 BD 4C FD    	LDA SCANEX,X	;start at end of signature
+   FD44 DD 03 A0    	CMP A0BASE+3,X	;compare to ROM sig area
+   FD47 D0 03       	BNE SCANEX	;no match, exit loop
+   FD49
+   FD49 CA          	DEX		;match; check next char
+   FD4A D0 F5       	BNE SCNLOOP	;loop
+   FD4C
+   FD4C             SCANEX
+   FD4C 60          	RTS		;return; Z=0 if no match
+   FD4D             ;
+   FD4D             ; ROMSIG - Autostart ROM signature
+   FD4D             ;
+   FD4D             ROMSIG
+   FD4D 4130C3C2CD  	.db "A0", $C3, $C2, $CD  ;A0CBM
+Routine RAMTAS
+	The RAMTAS routine is the second subroutine in the initialization
+process. It clears the first three pages of RAM, then searches for expansion
+memory. If any is found, the screen memory, color memory, and start of
+BASIC RAM pointers are adjusted to their documented alternates.
+   FD8D  ;===================================================
+   FD8D  ; RAMTAS - Initialize system contents
+   FD8D  ;
+   FD8D             RAMTAS
+   FD8D A9 00       	LDA #$00	;zero regs .A and .X
+   FD8F AA          	TAX
+   FD90
+   FD90             RAMTSLP1		;clear system memory areas
+   FD90 95 00       	STA USRPOK,X	;zero page
+   FD92 9D 00 02    	STA BUF,X	;clear page 2
+   FD95 9D 00 03    	STA ERRVPT,X	;clear page 3
+   FD98 E8          	INX
+   FD99 D0 F5       	BNE RAMTSLP1	;loop till done
+   FD9B
+   FD9B A2 3C       	LDX #$3C	;setup cassette buffer
+   FD9D A0 03       	LDY #$03	;area to $033c
+   FD9F 86 B2       	STX TAPE1
+   FDA1 84 B3       	STY TAPE1+1
+   FDA3 85 C1       	STA STAL	;clear I/O start address…
+   FDA5 85 97       	STA REGSAV	;…register save
+   FDA7 8D 81 02    	STA OSSTAR	;…and start of OS memory ptr
+   FDAA A8          	TAY		; .Y=0
+   FDAB A9 04       	LDA #$04	;check RAM from $0400
+   FDAD 85 C2       	STA STAL+1	;set I/O start to page 3
+   FDAF
+   FDAF             RAMTASLP2
+   FDAF E6 C1       	INC STAL	;increment LSB
+   FDB1 D0 02       	BNE RAMTAS1	;not done with page, cont.
+   FDB3
+   FDB3 E6 C2       	INC STAL+1	;inc. to new page
+   FDB5
+   FDB5             RAMTAS1
+   FDB5 20 91 FE    	JSR MEMTST	;test RAM
+   FDB8 A5 97       	LDA REGSAV
+   FDBA F0 22       	BEQ RAMTAS3
+   FDBC B0 F1       	BCS RAMTASLP2	;next address
+   FDBE
+   FDBE A4 C2       	LDY STAL+1	;done testing,get RAM top MSB…
+   FDC0 A6 C1       	LDX STAL	;…and LSB
+   FDC2 C0 20       	CPY #$20	; top at $2000
+   FDC4 90 25       	BCC I6561LP	;page below $2000, halt
+   FDC6
+   FDC6 C0 21       	CPY #$21	;RAM at $2000?
+   FDC8 B0 08       	BCS RAMTAS2	;yes, set params
+   FDCA
+   FDCA A0 1E       	LDY #$1E	;$1E00
+   FDCC 8C 88 02    	STY HIPAGE
+   FDCF
+   FDCF             RAMTAS1A
+   FDCF 4C 7B FE    	JMP STOTOP	;CLC and set RAM top
+   FDD2
+   FDD2             RAMTAS2
+   FDD2 A9 12       	LDA #$12	;With exp. RAM, BASIC starts
+   FDD4 8D 82 02    	STA OSSTAR+1	;at $1200…
+   FDD7 A9 10       	LDA #$10	;…and screen starts at $1000
+   FDD9 8D 88 02    	STA HIPAGE
+   FDDC D0 F1       	BNE RAMTAS1A	;set top of RAM and exit
+   FDDE
+   FDDE             RAMTAS3
+   FDDE 90 CF       	BCC RAMTASLP2	;loop to next address
+   FDE0
+   FDE0 A5 C2       	LDA STAL+1	;get MSB of I/O start
+   FDE2 8D 82 02    	STA OSSTAR+1	;save as start of OS
+   FDE5 85 97       	STA REGSAV	;save copy
+   FDE7 C9 11       	CMP #$11	;page $11
+   FDE9
+   FDE9             RATS3
+   FDE9 90 C4       	BCC RAMTASLP2
+   FDEB
+   FDEB             I6561LP
+   FDEB 20 C3 E5    	JSR V6561I-2	;$E5C3 init VIC regs
+   FDEE 4C EB FD    	JMP I6561LP
+This routine actually tests the RAM, and is called during the memory
+search loop at $FDB5. It uses a simple walking-bit pattern to test for
+memory defects:
+   FE91  ;===================================================
+   FE91  ; MEMTST - Test memory
+   FE91  ;		.Y is index in page
+   FE91             MEMTST
+   FE91 B1 C1       	LDA (STAL),Y	;get address
+   FE93 AA          	TAX		;save .A
+   FE94 A9 55       	LDA #%01010101	;set pattern
+   FE96 91 C1       	STA (STAL),Y	;save it…
+   FE98 D1 C1       	CMP (STAL),Y	;…and compare
+   FE9A D0 08       	BNE MEMTS1	;not equal
+   FE9C             			;pattern compares OK
+   FE9C 6A          	ROR A		;%10101010 invert pattern
+   FE9D 91 C1       	STA (STAL),Y	;save it…
+   FE9F D1 C1       	CMP (STAL),Y	;…and compare
+   FEA1 D0 01       	BNE MEMTS1	;not equal
+   FEA3 A9          	.db $A9		;LDA #$18 for OK, $55 or $AA
+   FEA4             			; for failed pattern
+   FEA4             MEMTS1
+   FEA4 18          	CLC		;CLC only on error
+   FEA5 8A          	TXA		;restore previous .A
+   FEA6 91 C1       	STA (STAL), Y	;save it
+   FEA8 60          	RTS
+	The RAMTAS routine also calls STOTOP to save the top of RAM pointer:
+   FE73  ;==================================================
+   FE73  ; IMEMTP - Set/read top of memory (internal)
+   FE73  ;	On entry, SEC to read, .X/.Y is LSB/MSB
+   FE73  ;	          CLC to set, .X/.Y is LSB/MSB
+   FE73  ;
+   FE73             IMEMTP
+   FE73 90 06       	BCC STOTOP	;set or read?
+   FE75 AE 83 02    	LDX OSTOP	;get top of memory
+   FE78 AC 84 02    	LDY OSTOP+1
+   FE7B
+   FE7B             STOTOP
+   FE7B 8E 83 02    	STX OSTOP	;set top of memory
+   FE7E 8C 84 02    	STY OSTOP+1
+   FE81 60          	RTS
+Routine IRESTR
+	This routine loads (or re-loads) the default Kernal vectors upon
+POR or Run-Stop/Restore sequences. The default vectors that are loaded
+include the links to IRQ, NMI, Open, Close, Channel In, Channel Out,
+Clear Channels, Character In, Character Out, Scan Stop Key, Get Keyboard
+Character, Close All, Load and Save routines within the Kernal ROM.
+   FD52  ;====================================================
+   FD52  ; IRESTR - Restore KERNAL hardware vectors (internal)
+   FD52  ;		 Called during POR and NMI sequences.
+   FD52  ;
+   FD52             IRESTR
+   FD52 A2 EA       	LDX #$EA 	;FIXUP2;#$6D points to list of
+   FD54 A0 EA       	LDY #$EA 	;FIXUP2+1;#$FD $FD6D KERNAL vecs
+   FD56 18          	CLC
+   FD57             ;
+   FD57             ; IVECTR - Change vectors for user
+   FD57             ;	On entry, SEC= read vector to .X/.Y LSB/MSB
+   FD57             ;	          CLC= set vector from .X/.Y LSB/MSB
+   FD57             ;
+   FD57             IVECTR
+   FD57 86 C3       	STX MEMUSS	;save vector to temp
+   FD59 84 C4       	STY MEMUSS+1
+   FD5B A0 1F       	LDY #$1F	;# of bytes to move
+   FD5D
+   FD5D             VECLOOP
+   FD5D B9 B6 02    	LDA IRQVP,Y	;get old vector address
+   FD60 B0 02       	BCS VECSK	;branch on CY=1/read
+   FD62
+   FD62 B1 C3       	LDA (MEMUSS),Y	;get new vector address
+   FD64
+   FD64             VECSK
+   FD64 91 C3       	STA (MEMUSS),Y	;save new address to temp
+   FD66 99 B6 02    	STA IRQVP,Y	;and to vector area
+   FD69 88          	DEY		;go to next one
+   FD6A 10 F1       	BPL VECLOOP	;loop
+   FD6C 60          	RTS
+   FD6D
+   FD6D             ;
+   FD6D             ;KERNAL Vectors
+   FD6D             ;
+   FD6D             KNRLSV
+   FD6D BFEAD2FEADFE	.dw IRQVEC, WARMST, LNKNMI, IOPEN
+   FD73 0AF4
+   FD75 4AF3C7F209F3	.dw ICLOSE, ICHKIN, ICHKOT, ICLRCH
+   FD7B F3F3
+   FD7D 0EF27AF270F7	.dw ICHRIN, ICHROT, ISTOP,  IGETIN
+   FD83 F5F1
+   FD85 EFF3D2FE49F5	.dw ICLALL, WARMST, LNKLOD, LNKSAV
+   FD8B 85F6
+Routine IOINIT
+	This routine initializes the VIAs. Lots of bit twiddling goes on
+here to set-up the various ports. This routine also starts the system
+IRQ timer.
+   FDF9  ;===================================================
+   FDF9  ; IOINIT - Initialize I/O registers
+   FDF9  ;
+   FDF9             IOINIT
+   FDF9 A9 7F       	LDA #%01111111	;disable HW interrupts
+   FDFB 8D 1E 91    	STA D1IER	;interrupt enable reg VIA1
+   FDFE 8D 2E 91    	STA D2IER	;interrupt enable reg VIA2
+   FE01 A9 40       	LDA #%01000000	;Sets tmr1/VIA2 to free-
+						;running; used for IRQ
+   FE03 8D 2B 91    	STA D2ACR	;VIA2 aux ctrl reg
+   FE06 A9 40       	LDA #%01000000	;same for tmr1/VIA1. Used for
+						; RS-232 timing
+   FE08 8D 1B 91    	STA D1ACR	;VIA1 aux ctrl reg
+   FE0B A9 FE       	LDA #%11111110	;sets CA1/2, CB1/2 modes
+						; CA2/CB2 manual H, CB1 pos
+						; trig. CA1 negative trig.
+						;CA1=Restore
+						;CA2=cassette motor
+						;CB1=user port
+						;CB2=user port
+   FE0D 8D 1C 91    	STA D1PCR	;VIA1 periph ctrl reg
+   FE10 A9 DE       	LDA #%11011110	;sets CA1/2, CB1/2 modes
+						; CB2 manual L, CB1 pos
+						; trig. CA1 negative trig.
+						; CA2 manual H
+						;CA1=cassette read
+						;CA2=*SCLK
+						;CB1=*SRQIN
+						;CB2=*SDATAOUT
+   FE12 8D 2C 91    	STA D2PCR	;VIA2 periph ctrl reg
+   FE15 A2 00       	LDX #$00	;DDR all bits IN
+   FE17 8E 12 91    	STX D1DDRB	;VIA1/B data dir reg
+   FE1A A2 FF       	LDX #%11111111	;DDR all bits OUT
+   FE1C 8E 22 91    	STX D2DDRB	;VIA2/B data dir reg
+   FE1F A2 00       	LDX #$00	;DDR all bits IN
+   FE21 8E 23 91    	STX D2DDRA	;VIA2/A data dir reg
+   FE24 A2 80       	LDX #%10000000	;BIT7=OUT, BITS6-0 IN
+   FE26 8E 13 91    	STX D1DDRA	;VIA1/A data dir reg
+   FE29 A2 00       	LDX #$00
+   FE2B 8E 1F 91    	STX D1ORAH	;VIA1 output reg A MSB
+   FE2E 20 84 EF    	JSR SCLK1	;set IEEE clock line=1
+   FE31 A9 82       	LDA #%10000010	;enable IER CA1/VIA1 RESTOR
+   FE33 8D 1E 91    	STA D1IER	;VIA1 IER BIT7=1
+   FE36 20 8D EF    	JSR SCLK0	;set IEEE clock line=0
+   FE39             ;
+   FE39             ; ENABTM - Enable timers
+   FE39             ;
+   FE39             ENABTM
+   FE39 A9 C0       	LDA #%11000000	;enable tmr1/VIA2 (IRQ)
+   FE3B 8D 2E 91    	STA D2IER	;VIA2 IER BIT7-6=1
+   FE3E A9 89       	LDA #%10001001	;$89 IRQ tic divisor LSB
+   FE40 8D 24 91    	STA D2TM1L	;VIA2 tmr1 LSB
+   FE43 A9 42       	LDA #%01000010	;$42 IRQ tic divisor MSB
+   FE45 8D 25 91    	STA D2TM1L+1	;VIA2 tmr1 MSB
+   FE48 60          	RTS
+Routine CINT1
+	This final routine initializes the character generator, sets the
+initial screen colors, clears the screen and "homes" the cursor, and
+updates screen and cursor pointers.
+   E518  ;==================================================
+   E518  ; CINT1 - Initialize I/O
+   E518  ;
+   E518
+   E518             ;
+   E518             ;Screen reset
+   E518             ;
+   E518             CINT1
+   E518 20 BB E5    	JSR IODEF1	;set deflt I/O and init VIC
+   E51B AD 88 02    	LDA HIPAGE	;get screen memory page
+   E51E 29 FD       	AND #%11111101	;$FD MS nibble is ChrROM and
+   E520 0A          	ASL A		;LS nibble is ChrRAM
+   E521 0A          	ASL A
+   E522 09 80       	ORA #%10000000	;$80
+   E524 8D 05 90    	STA VRSTRT	;set chargen ROM to $8000
+   E527 AD 88 02    	LDA HIPAGE	;get screen mem page
+   E52A 29 02       	AND #%00000010	;$02 check for screen RAM at
+   E52C F0 08       	BEQ CINT1A	;$E536 $1E page
+   E52E
+   E52E A9 80       	LDA #%10000000	;$80 screen RAM is at $10 page
+   E530 0D 02 90    	ORA VRCOLS	;set Bit7
+   E533 8D 02 90    	STA VRCOLS
+   E536
+   E536             CINT1A
+   E536 A9 00       	LDA #$00
+   E538 8D 91 02    	STA SHMODE	;enable shift-C=
+   E53B 85 CF       	STA BLNON	;start at no blink
+   E53D
+   E53D A9 EA       	LDA #$EA 	;FIXUP1+34;#$DC
+   E53F 8D 8F 02    	STA FCEVAL
+   E542 A9 EA       	LDA #$EA 	;FIXUP1+35;#$EB
+   E544 8D 90 02    	STA FCEVAL+1	;shift mode evaluation
+   E547
+   E547 A9 0A       	LDA #$0A
+   E549 8D 89 02    	STA KBMAXL	;key buffer=16
+   E54C 8D 8C 02    	STA KRPTDL	;repeat delay=16ms
+   E54F A9 06       	LDA #$06
+   E551 8D 86 02    	STA CLCODE	;color=6(blue)
+   E554 A9 04       	LDA #$04
+   E556 8D 8B 02    	STA KRPTSP	;repeat speed
+   E559 A9 0C       	LDA #$0C
+   E55B 85 CD       	STA BLNCT	;blink timer=12ms
+   E55D 85 CC       	STA BLNSW	;set for solid cursor
+   E55F             ;
+   E55F             ; Clear screen
+   E55F             ;
+   E55F             CLRSCN
+   E55F AD 88 02    	LDA HIPAGE	;mem page for screen RAM
+   E562 09 80       	ORA #%10000000	;$80
+   E564 A8          	TAY
+   E565 A9 00       	LDA #$00
+   E567 AA          	TAX
+   E568
+   E568             CLRLP1
+   E568 94 D9       	STY SLLTBL,X	;address of screen line
+   E56A 18          	CLC
+   E56B 69 16       	ADC #$16	;add 22
+   E56D 90 01       	BCC CLRSC1
+   E56F
+   E56F C8          	INY
+   E570
+   E570             CLRSC1
+   E570 E8          	INX
+   E571 E0 18       	CPX #$18	;all rows done?
+   E573 D0 F3       	BNE CLRLP1
+   E575
+   E575 A9 FF       	LDA #$FF
+   E577 95 D9       	STA SLLTBL,X
+   E579 A2 16       	LDX #$16
+   E57B
+   E57B             CLRLP2
+   E57B 20 8D EA    	JSR CLRLIN	;clear line
+   E57E CA          	DEX
+   E57F 10 FA       	BPL CLRLP2
+   E581             ;
+   E581             ; "Home" cursor
+   E581             ;
+   E581             HOME
+   E581 A0 00       	LDY #$00
+   E583 84 D3       	STY CSRIDX	;set column to 0
+   E585 84 D6       	STY CURROW	;and row to 0, too
+   E587             ;
+   E587             ; Set screen pointers
+   E587             ;
+   E587             SCNPTR
+   E587 A6 D6       	LDX CURROW
+   E589 A5 D3       	LDA CSRIDX
+   E58B
+   E58B             SCNPLP
+   E58B B4 D9       	LDY SLLTBL,X
+   E58D 30 08       	BMI SCNPT1
+   E58F
+   E58F 18          	CLC
+   E590 69 16       	ADC #$16
+   E592 85 D3       	STA CSRIDX
+   E594 CA          	DEX
+   E595 10 F4       	BPL SCNPLP
+   E597
+   E597             SCNPT1
+   E597 B5 D9       	LDA SLLTBL,X
+   E599 29 03       	AND #$03
+   E59B 0D 88 02    	ORA HIPAGE
+   E59E 85 D2       	STA LINPTR+1
+   E5A0 BD FD ED    	LDA LBSCAD,X
+   E5A3 85 D1       	STA LINPTR
+   E5A5 A9 15       	LDA #$15
+   E5A7 E8          	INX
+   E5A8
+   E5A8             SCNLP1
+   E5A8 B4 D9       	LDY SLLTBL,X
+   E5AA 30 06       	BMI SCNEXIT
+   E5AC
+   E5AC 18          	CLC
+   E5AD 69 16       	ADC #$16
+   E5AF E8          	INX
+   E5B0 10 F6       	BPL SCNLP1
+   E5B2
+   E5B2             SCNEXIT
+   E5B2 85 D5       	STA LINLEN
+   E5B4 60          	RTS		; return to init routine
+	CINT1 calls one routine, IODEF1, which resets the default input
+and output devices to the keyboard and screen, respectively, and then
+resets the VIC chip's registers to their default. The one interesting
+thing to note is the PANIC entry at $E5B5. The Kernal does not call the
+PANIC entry! The Kernal bypasses the extra JSR by calling the IODEF1 code
+directly. Not that there is any magic in the PANIC entry; it only calls
+the IODEF1 code and homes the cursor.  I haven't yet examined the
+BASIC ROM, so it is possible that BASIC calls or vectors the PANIC entry.
+   E5B5		;================================================
+   E5B5		; PANIC - Set I/O defaults (unused entry point??)
+   E5B5		;
+   E5B5             PANIC
+   E5B5 20 BB E5    	JSR IODEF1	;reset devices and VIC regs
+   E5B8 4C 81 E5    	JMP HOME	;home cursor
+   E5BB           ;
+   E5BB           ; Real PANIC entry; reset default devices
+   E5BB           ;
+   E5BB             IODEF1
+   E5BB A9 03       	LDA #$03
+   E5BD 85 9A       	STA OUTDEV	;reset output to screen
+   E5BF A9 00       	LDA #$00
+   E5C1 85 99       	STA INDEV	;reset input to kbrd
+   E5C3             ;
+   E5C3             ; Initialize 6561 VIC
+   E5C3             ;
+   E5C3 A2 10       	LDX #$10	;move 16 VIC registers
+   E5C5
+   E5C5
+   E5C5             V6561I
+   E5C5 BD E3 ED    	LDA VICSUP-1,X	;start at end of tbl
+   E5C8 9D FF 8F    	STA $8FFF,X	;start at end of regs
+   E5CB CA          	DEX		;decrement index
+   E5CC D0 F7       	BNE V6561I	;do next register
+   E5CE
+   E5CE 60          	RTS
+   EDE4             ;
+   EDE4             ;VIC chip video control constants
+   EDE4             ;
+   EDE4             VICSUP
+   EDE4			.db $05	;bit7 interlace, 6-0 HCenter
+.db $19	;VCenter
+.db $16	;bit7=video address, 6-0 #rcols
+.db $2E	;bit6-1=#rows, bit0=8x8 or 16x8 chars
+.db $00	;current TV raster beam line
+.db $C0	;bit0-3 start of char memory
+	;bit4-7 is rest of video address
+;BITS 3,2,1,0 CM starting address
+;             HEX   DEC
+;0000   ROM  8000  32768 *default
+;0001        8400  33792
+;0010        8800  34816
+;0011        8C00  35840
+;1000   RAM  0000  0000
+;1001        xxxx  }
+;1010        xxxx  }unavail.
+;1011        xxxx  }
+;1100        1000  4096
+;1101        1400  5120
+;1110        1800  6144
+;1111        1C00  7168
+.db $00	;Hpos of light pen
+.db $00	;Vpos of light pen
+   EDEC			.db $00 ;Digitized value of paddle X
+.db $00 ;Digitized value of paddle Y
+.db $00 ;Frequency for oscillator 1 (low)
+.db $00 ;Frequency for oscillator 2 (medium)
+.db $00 ;Frequency for oscillator 3 (high)
+.db $00 ;Frequency of noise source
+.db $00 ; bit0-3 sets volume of all sound
+; bit4-7 are auxiliary color information
+.db $1B	;Screen and border color register
+; bits 4-7 select background color
+; bits 0-2 select border color
+; bit 3 selects inverted or normal mode
+Conclusion
+	Next time, we'll examine more routines and answer any questions
+that you may have.
+.......
+....
+..
+.                                    C=H #17
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+NTSC-PAL fixing, part 1
+===============
+Russel Reed <rreed@egypt.org>, Robin Harbron <macbeth@tbaytel.net>, S. Judd
+Introduction
+============
+	Just about everyone knows that there are differences between
+NTSC and PAL machines.  Most people are also familiar with at least a
+few of the technical issues, such as extra raster lines, and know that
+these differences can cause certain types of programs to fail, in particular
+games and demos.  But how does one actually go about fixing one of
+these programs?  It is that with which this series of articles is
+concerned.
+  	"Fixing" has been a job which a number of 64 hackers have taken on,
+in order to enjoy programs written in another country.  Some of the major
+game companies imported software from other countries and sometimes had
+to reprogram the software so it would run correctly.  Fixing is not
+always a simple task, but there are some techniques and strategies which
+are useful in most cases.  There are only a few different classes of
+problems overall.
+	We decided to approach this problem in a novel way: a pair of
+yay-hoos (Robin and Steve) would place themselves under the tutelage
+of an experienced fixer (Russ, a.k.a. Decomp/Style), fix up a program,
+and write up the results and experiences.  The first step was deciding
+on which program (or programs) to fix.  It had to first of all be
+fixable!  Since time is always in short supply it couldn't be a big,
+complicated project.  And finally, since we were just getting our fixing
+feet wet, the actual fixing job needed to be fairly straightforward (so
+the big custom track-loading demo will have to wait for a future article).
+	But demos are the natural fixing candidate, and after viewing
+several different ones, and examining the code to see what would need
+fixing, we finally decided on the demo "Slow Ideas", written by a couple
+of crazy Finns way back in 1989.  It's a cool demo, was challenging and
+fun to fix, and turned out to be pedagogically a good choice.  It's
+two pages, and each page had a number of effects in need of fixing.
+These pages, and the practical side of fixing, will be discussed in
+some detail below, but first we need to review the differences between
+PAL and NTSC, and discuss the implications of those differences, and
+how they manifest themselves in programs.
+NTSC/PAL Differences
+====================
+	There is really just one primary difference between NTSC and PAL
+machines: the graphics, which means VIC.  But since VIC generates the
+machine clock cycles, that means the computers run at different speeds.
+And since they run at different speeds, the CIA timers run at different
+speeds, the SIDs run at different speeds, and the CPUs run at different
+speeds.  So just having a slightly different graphics format affects
+nearly every aspect of the machine's operation.
+VIC and graphics
+----------------
+	The PAL television standard is different from the NTSC standard.
+The primary difference is actually the way color is encoded, but
+the main issue for the 64 is the frame rate, the number of raster
+lines, and the number of cycles per raster line:
+VIC chip         Frame Rate  Raster lines  Cycles per line
+R56A (NTSC):    60Hz          262	         64
+R8+  (NTSC):    60Hz          263	         65
+     (PAL) :    50Hz          312	         63
+As the video raster beam sweeps across the television tube, VIC tells
+it what to display, one pixel at a time.  The CPU clock is exactly
+/8th of the "pixel clock", so that one CPU cycle corresponds to
+eight pixels on the screen.  Thus, it is clear from the above table
+that a PAL machine has 63*8 = 504 pixels per raster line.  320 of
+those pixels comprise the visible display, while the other 184 make
+up the left and right borders.
+	What is important here are the three numbers in the table, though.
+First consider the number of cycles per line.  If a program is exactly
+synchronized with the raster, then it can make precise changes to the
+screen merely by letting a certain number of cycles elapse.  A simple
+example is making raster bars; a more involved example is opening the side
+borders, or generating an FLI display (which you can read about in previous
+issues of C=Hacking).  Needless to say, programs which require exact cycle
+timings will fail when run on a machine with a different number of cycles
+per line.  (Note that on older computers, like the old Atari 2600, the CPU
+actually built the screen display, and so _all_ the screen code had to
+be exactly timed).
+	Next observe the different number of raster lines.  The visible
+display begins on raster line 50, and there are 200 visible raster lines
+(320x200, you know).  That leaves 13 raster lines for the NTSC border, but
+raster lines for the PAL border.  This causes two problems.  Code which
+waits for a raster line greater than 263 will have problems on an NTSC
+machine.  A busy loop such as
+	LDA #$10 	;Wait for line 266
+	CMP $D012
+	BNE *-5
+	LDA $D011
+	BPL *-12
+will never exit, while a raster IRQ will never occur.  The code must be
+adjusted to use a different raster line or another method of timing.
+Sprite graphics on lines greater than 263 will wrap around on an NTSC
+screen and in some cases the image will be displayed twice.  The sprites
+must either be moved or their images truncated to correct this.
+	Moreover, the extra raster lines mean extra cycles per PAL frame.
+Using the table above, an NTSC machine has 263*65 = 17095 cycles per frame,
+and a PAL machine has 312*63 = 19656 cycles per frame.  In many demos and
+games those extra 2500 cycles are used to perform needed calculations and
+operations before the next frame begins, which leads to all kinds of
+trouble on an NTSC machine.  Note that NTSC machines have more cycles
+available in the _visible_ display, while PAL machines have many more
+cycles available in the borders.
+	Finally, consider the frame rate.  On an NTSC machine, there are
+cycles per frame, and 60 frames per second, giving 17095*60 = 1025700
+cycles per second, or 1.02 MHz.  On a PAL machine, there are 19656*50 = 982800
+cycles per second, or 0.98 MHz.  So although a PAL machine has more cycles
+per frame, the CPU runs slightly slower than on an NTSC machine.  Thus a
+game like Elite, which involves raw computation, runs a little faster
+on an NTSC machine.  But for most games and demos it is the cycles per
+frame which is important -- as long as all game calculations can get done
+before the next frame, the game can run at the full frame rate.  Also
+note that most tunes are synced to the screen, so when a PAL tune,
+designed to play at 50 calls per second, is suddenly called 60 times
+each second, it will play noticeably faster.
+	By the way, there actually aren't _exactly_ 50 or 60 frames per
+second.  The frames per second is actually determined by the machine
+clock rate, not the other way around!  The actual system clock rates are
+14318181 / 14 = 1022727Hz for NTSC and 17734472 / 18 = 985248Hz for PAL.
+Dividing by 17095 (PAL=19656) cycles per frame gives 59.826 frames/second
+NTSC and 50.124 fps PAL.
+	The important thing to remember here is that PAL machines run
+slightly slower than NTSC machines, but have many more cycles per video
+frame.
+	Just as a side node, the above calculation should indicate to you
+that although the AC electricity lines are 60Hz in the US and 50Hz in Europe,
+that has nothing to do with the 50/60Hz PAL/NTSC frame rates.  Not only can
+an NTSC monitor easily display a 50Hz signal (let alone 59.826Hz), but the
+actual power frequency fluctuates around that 50/60Hz anyways, so that
+the AC line frequency is only 50/60Hz on _average_ -- good enough to
+run a clock, but not nearly precise enough to generate a video signal.
+	Also note that there are two different NTSC VIC chips in the table.
+The 64 cycles/line VIC was present in the earliest 64s shipped.  This is
+actually a bug in the chip, though, and 65 cycles/line is the "correct",
+not to mention most common, NTSC VIC chip, and the one which we will
+refer to in this article.  Oh?  You don't believe me?  Well, from the
+March 1985 IEEE Spectrum article:
+	In addition to the difficuly with the ROM [sparklies], "I made
+	a logic error," Charpentier recalled.  The error, which was
+	corrected sometime after Charpentier left Commodore, caused
+	the early C-64s to generate the wrong number of clock cycles
+	on each horizontal video line.  "It was off by one," he said.
+	"Instead of 65 cycles per line, I had 64."
+	As a result, the 180-degree phase shift between the black-and-white
+	and color information, which would have eliminated color transition
+	problems, didn't occur.  Depending on their color and the color of
+	the background, the edges of some objects on the screen would appear
+	slightly out of line...
+There ya go!
+Machine cycles
+--------------
+	The tiny CPU speed difference has a number of important ramifications.
+It means that the CIA timers on a PAL machine run slightly slower than on
+an NTSC machine, so timer values may have to be recalibrated; note that the
+system CIA interrupt has a different setting for PAL and NTSC.  Moreover,
+a disk drive runs at exactly 1MHz -- there are no "PAL disk drives" -- which
+means cycle-exact fastloaders will not synchronize correctly on
+different-speed machines.  And finally, it means that SID works differently.
+	Not only will the tempo of any interrupt-based tune (raster _or_ CIA)
+change, but the actual pitch will change as well.  SID generates its waveforms
+by simply updating an internal counter every cycle, so an NTSC SID is
+essentially playing a digital sample at 1.02 MHz.  When that sample is
+played at 0.98 MHz, it's like slowing a record player down a little -- the
+pitch decreases.  To be specific, the _absolute_ pitch changes, but the
+_relative_ pitch between notes does not; the tune plays the same, but
+the pitches are all a little over a quarter-step lower.
+	Practically speaking, this is totally irrelevant to fixing.
+Only the tune speed is of significant interest.
+Fixing the Problems
+===================
+	The previous section described different classes of problems
+which can occur from the different cycles per line, lines per frame,
+cycles per frame, and cycles per second.  These problems include screen
+syncs, bad interrupts and infinite busy-loops, too many cycles per frame,
+mis-timed timers and fastloaders, and different music tempos.
+	Fixing tunes is easy.  For programs in which the interrupt frequency
+is otherwise unimportant, the interrupt timer source can be adjusted to
+the correct frequency.  In other cases, if the interrupt frequency
+cannot be changed, the music speed may be adjusted by calling the play
+routine twice on every sixth interrupt or not calling it on the sixth
+interrupt.  If perfection is needed, the music data might be adjusted or
+the music routine rewritten to work at the different frequency.  Much of
+the time none of these are necessary, as the music sounds fine at the
+different speed anyway.
+	Next consider too many cycles per frame.  In order to fix this
+class of problems, we must improve the efficiency of the code.  There are
+three cases here.  Sometimes busy waits are used to synchronize the code
+with a raster line on the screen.  This wastes cycles and imposes some
+restrictions which result in additional wasted cycles.  These busy waits
+may sometimes be replaced by raster interrupts which make better use of
+the available cycles.
+	At other times, programmers will set up all graphic updates so that
+the code executes in the vertical borders.  NTSC machines have a big
+disadvantage here, as we've seen earlier.  Usually this code can be
+rearranged to take advantage of the available cycles during the screen
+display.  This can be tricky, as you don't want to be updating the screen
+at the same time it is being displayed, but by splitting updates up it can
+usually be pulled off.
+	Finally there are some cases in which neither of these techniques
+is of use.  For a perfect fix, the code must be optimized, often by
+sacrificing memory.  If it can't be optimized, then something has to go;
+effects can be truncated or updates slowed down so that less is updated
+each frame.
+	Next consider the different cycles per raster line.  In most
+cases, this difference is not significant.  The routines for which it is
+significant are raster routines, where synchronization with the video
+chip is established by using exactly the right number of processor
+cycles.  FLI, VSP, and color bars are all affected by this.  Color bars
+will have flicker and look crooked; VSP effects will often be shifted
+the wrong amount; FLI routines usually either repeat the top row of
+graphics all the way down the screen or else don't display at all.
+These problems can all be corrected by adding or subtracting the correct
+number of cycles per raster line.  In most cases, this class of problem is
+actually the easiest to fix.
+	There are several approaches to putting the right number of cycles
+in for the fix.  If the source is available, inserting a NOP instruction
+may be all that is required for an NTSC fix.  Without the source, a
+modified routine may be inserted into some empty memory and the original
+routine bypassed.  Sometimes the code may be shuffled around enough to
+insert a NOP with a machine language monitor.  You can sometimes change
+the opcodes used to use a different number of cycles.  Consider the
+delays that can be added with just two bytes:
+	CMP #$EA	;2 cycles
+	BIT $EA		;3 cycles
+	NOP NOP		;4 cycles
+	INC $EA		;5 cycles
+	CMP ($EA,X)	;6 cycles
+This gives lots of room to work with, assuming the flags aren't important.
+If .X or .Y is unused, then
+	STA $1234
+can be changed to
+	STA $1234,X or STA $1234,Y
+to gain an extra cycle (if .Y=0).  If .Y is known to contain a fixed
+value like $FF, the target can be adjusted so that STA $1234 becomes
+STA $1135,Y.  Usually in an FLI routine, you'll see STA $D018, STA $D011,
+and STA $D016 together.  Two of these can be replaced with indexed opcodes
+to add the two extra cycles needed for an NTSC fix.
+	Also note that when sprites are active, a different amount
+of cycles may get stolen depending on which instruction is executing
+when the sprite data is read.  C=Hacking #3 discusses this in detail,
+and explains how it may be used to synchronize code with the raster beam.
+From a fixing standpoint, it means that you don't always want to add
+two instruction cycles to convert a piece of PAL code to NTSC; sometimes,
+as in the demo below, only _one_ cycle must be added to fix certain routines,
+with the other cycle being eaten by VIC.
+	Sprites have a few differences between PAL and NTSC as well; like
+the other differences they are not evident in many programs.  The horizontal
+sprite positions start at zero on the left side of the screen and
+increase as you move to the right.  At some point past position 300, the
+positions wrap back around to the left side of the screen.  For these
+high horizontal positions, the sprites are 8 pixels farther right on a
+PAL screen than they would be on an NTSC screen.  When these high
+positions are used to create sprites that extend all the way to the left
+edge of the visible screen, problems show up.  A PAL routine will have
+an eight pixel wide gap in the sprites on an NTSC machine while an NTSC
+routine will have eight pixels of overlap on a PAL machine.  Often these
+positions are stored in a table or loaded into a register with immediate
+addressing.  In this case it is trivial to adjust the value by eight in
+the appropriate direction.  The vertical sprite registers present similar
+behavior, which is seen even less often.
+	As was stated earlier, the 65xx processors in the 64 and 1541 run
+at close to the same frequency, but the ratio of the 64's processor
+frequency to the 1541's processor frequency is not exactly 1.0, and the
+ratio is different on PAL and NTSC systems.  This means that although the
+in the 64 and the 6502 in the 1541 may be synchronized at the start of
+a section of code, after executing the same number of cycles they will no
+longer be synchronized.  Cycles may be added or subtracted on either
+processor to bring them back into sync, but this adjustment varies
+depending on either a PAL or NTSC system.  This is most often seen in
+fastloaders, where the code depends on the two processors being in sync
+in order to transmit 8 bits one at a time or four pairs of two bits
+without the need for handshaking.  As with raster routines, a few cycles
+need to be added or subtracted for a fix.  Finding the right place to
+insert or remove these cycles can be challenging.  Instead of trying to
+remove cycles from the 64 routine, which may be impossible, you can instead
+e.g. add cycles to the complementary drive routine.
+  	Finally, understanding what problems may arise and what should be
+done to correct them is only part of the skillset needed by a good fixer.
+In the best case, you'll be working with source code which you've written
+and understand.  In other cases, someone else may have written the code,
+in which case you must study and understand it before you can start
+trying to fix any NTSC/PAL problems.  Often the source code isn't even
+available, leaving you to work in a machine language monitor.  This
+imposes some new restraints.  In a monitor, it can be difficult to shift
+blocks of code around.  Several techniques are available to work around
+this.  Code can easily be left out by replacing it with NOP instructions
+or inserting JMPs to branch around sections which aren't needed.  Extra
+code can be patched in with a JMP instruction to the new code and another
+JMP instruction at its end to return to the old routines.  Where cycles
+need to be added, opcodes may be changed; for example, a LDA $ABCD might
+be replaced with LDA $ABCD,Y to add an extra cycle.  Finally, you may
+simply have to resort to a symbolic disassembly (i.e. disassemble into
+source code).  This may become an absolute necessity in some cases.
+	Then there are the cases where the object code isn't even readily
+accessible.  Some of the same skills used by crackers to remove copy
+protection are valuable.  Most commonly, games, tools, and demos will be
+compressed to make the files shorter, adding a single BASIC SYS command
+line as well.  These routines rarely try to be deceptive and are fairly
+short, so they can be undone, often just by modifying the existing
+decompression code a bit.  It is common practice to have a sequence
+cruncher on top of an RLE (or equal-byte) packer or linker.  Within the
+"scene", there will be intros to wade past, while commercial software
+will often have disk or tape copy protection and purposefully obtuse code.
+	Just remember that fixing is often about creativity and diligence.
+There is a common set of problems encountered and many creative solutions
+to correct them.  With practice, it becomes easier, just like anything else.
+You'll have an easier time of it if you pick your battles carefully.
+Remember that some problems just can't be fixed perfectly.  If you get
+stuck, try a different challenge and perhaps come back to the problem
+later.  And most of all, remember to have some fun!
+	With all this in mind, let's have a look at that demo!
+Slow Ideas, page 1
+----------
+Reconnaisance:
+	Page 1 is divided into basically three parts.  The upper part
+of the screen contains a stretching sprite tech-tech extending into
+the borders, overlayed on top of raster bars.  Then the rest of the
+screen features a large Pu-239 picture/logo.  Finally, a bouncing
+sprite scroll takes place in the lower portion of the screen, extending
+into the lower (but not side) borders.
+	The screen is built using two interrupt routines, one located
+at $1240 and the other located at $1280.  The $1240 routine is short
+and occurs at the bottom of the screen (raster $F8 or so).  It removes
+the upper/lower borders, performs some calculations, and calls the
+music.  The $1280 routine basically controlls the screen, and occurs
+at raster line $xx, at the top of the screen.  It generates the tech-tech
+sideborder display, and performs most of the calculations.  Both routines
+are vectored through $0314, not $FFFE.
+	Fortunately, there is a lot of distance between different routines,
+that is, there is a lot of empty memory between routines -- probably it was
+coded in an ML monitor.  This means that adding patches, or shifting
+routines around, is much easier.  What a forward-thinking guy that
+Pasi was, to realize that it would need to be fixed by C=Hacking one
+day.
+	When run, the demo is a mess.  The most prominent defects are
+that the music plays very slowly, the screen flashes, with the main
+picture flickering between the top of the screen and the lower portion
+of the screen, the side borders are not open in the tech-tech, and
+sometimes the scrolling sprites are off the screen.
+Too many cycles:
+	The half-speed music and flickering screen indicates that interrupts
+are getting skipped, which points a finger straight at too many cycles -- if
+the next interrupt is set *after* it is supposed to occur, then a whole
+frame will of course pass by before it actually occurs.
+	The first question is, where are the cycles getting eaten?  That is,
+of the two interrupt routines, which is using too many cycles.  The answer
+is immediately obvious: any interrupts that take place totally on the screen
+have *extra* cycles available -- 2 cycles per raster line.  The loss of cycles
+comes in the lower border, which means the $1240 interrupt:
+	jsr tune
+	jsr blah1
+ 	jsr blah2
+	LDA $D012
+	CMP #$1E
+	BCC *-7
+First note that curious $D012 code.  It surely is meant to compare with
+line $011E, not line $1E.  Line $011E doesn't ever occur on an NTSC machine,
+though, so it actually waits until line $1E, well past the $1280 interrupt.
+So the first order of business is to BIT that BCC out of existence.
+	Alas, this still does not fix up the flickering.  The next step
+is to BIT ($2C) out the JSR tune call, to see if the problem really is
+cycles.  And sure enough, ditching the tune gives a suddenly stable, or
+at least mostly stable, screen.  BITting out the other two subroutines,
+but keeping the music, gives an unstable screen; the tune simply has to
+go somewhere else.
+	Finding extra cycles takes a bit of work; for now, it is enough to
+$2C-BIT out the JSR TUNE and focus on the other problems.
+Tech-tech:
+	Let's have a look at the tech-tech routine:
+	$1280	LDA #$96
+		STA $DD00
+		LDY #$FF
+		BIT $EA
+		NOP
+		NOP
+		LDX #$09
+		DEX
+		BNE *-3
+	$1290	LDX #$5F
+	$1292	LDA $1000,X
+		STA $D018
+		DEC $D016	;Open border
+		STA $D021
+		INC $D016
+		LDA $1060,X
+	$12A4	STA $D017
+		STY $D017
+		LDA $1100,X
+		STA $D011
+		DEX
+	$12B1	LDA $1000,X
+		STA $D018
+		DEC $D016	;Open border
+		STA $D021
+		INC $D016
+		LDA $1060,X
+		STA $D017
+		STY $D017
+	$12C9	BIT $EA
+		NOP
+		DEX
+		BPL $1292
+The routine has three parts: the initial delay at $1280, and a two-part
+loop, the first part at $1292 and the second at $12B1.  The difference
+between the two is the LDA $1100,X STA $D011 at $12AA; without two
+parts to the loop, the branch would take too many cycles.  Each part will
+need fixing, since each part uses at least one raster.
+	Opening the side borders requires exact timing, yet the above
+routine is entered through $0314, which will always have some cycle
+variance.  The trick here is that all eight sprites are active; Pasi
+himself wrote a nice article in C=Hacking #3 on using sprites to
+synchronize the raster.  The basic idea is to get the CPU to wait on
+a specific instruction; VIC then frees up the bus on a specific cycle,
+and you know exactly where you are on the raster line.
+	First to fix is the initial line delay.  Since there are +2 extra
+NTSC cycles per raster line, it seems reasonable to first try adding
++2 cycles to the delay:
+BIT $EA	to	AND ($00),Y	;+2 cycles
+	     NOP		NOP
+	     NOP		NOP
+Really, CMP ($EA),Y is in general a better choice since it doesn't
+affect .A, but it doesn't matter here and we were young and naive,
+besides.  If the above +2 cycles aren't enough, it will be clear
+soon enough (but it turns out they are enough).
+	Next up is the borders.  Although the immediate impulse is
+to add +2 cycles per raster line -- +2 cycles to both loop parts --
+it's not obvious where the raster sync is taking place, and +2 cycles
+might cause the sync to fail.  In fact, what's needed is just +1 cycle
+per part:
+A4 STA $D017	to	STA $CF18,Y	;+1 cycle
+C9 BIT $EA	to	INC $00EA	;+1 cycle
+	     NOP
+The $12A4 fix uses the fact that .Y=$FF.  A NOP NOP NOP would have done
+the trick at $12C9 and be safer, but we used INC in our reckless youth.
+	And suddenly -- poof!  The borders open up and the screen stops
+flickering.  Why should the screen flicker at all?  Remember that the
+loop is split in two because of an STA $D011 instruction.  This STA
+pushes badlines off, so that the timing stays precise; since badlines
+never occur, the graphics data is never fetched.  It's only after the
+rasterbars that VIC starts fetching graphics data; without the $D011
+push, this data will appear on the first visible rasterline (hence the
+earlier flicker with too many cycles).  When an imperfect (from incorrect
+cycle timing) push takes place, the picture can get the jitters.  So
+now we know why the demo behaved as it did, earlier.
+Sprite scroll:
+	And yet... the screen still flickers, when sprites bounce down
+too low.  Clearly the sprites are eating into the cycles needed by the
+lower border routines.  The simplest way to fix this is of course to
+change the y-coordinates of all the sprites.  One option is to re-do
+all the coordinate tables.  But a much easier option exists: figure
+out which code stores the sprite coordinates, and subtract a fixed
+amount from each of the y-coordinates.  This routine just happens
+to be located at $1590, and the simple insertion of code
+AF STA $D001,Y to	CLC
+	     ...		SBC #$0E
+				STA $D001,Y
+fixes things up just dandy.
+Still too many cycles:
+	We still have the problem of what to do about the tune.  Since
+there aren't enough cycles in the $1240 interrupt, they need to be
+found somewhere else, and the only somewhere else is the $1280 interrupt,
+during the logo display.  The first thing to figure out is how many
+lines are needed, and how many lines are free.  This is easy enough,
+by simply moving the JSR TUNE to the end of the $1280 routine,
+sandwiched between an INC $D020 and a DEC $D020.  The border will
+then indicate the end of the $1280 interrupt as well as the size
+of the tune.
+	The good news is that $1280 has a fair amount of extra
+cycles available.  The bad news is that the music is fairly inefficient,
+and needs even more cycles.  But not many more -- just a good 8-12 raster
+lines.  $1280 is pretty large, so maybe by rewriting some code enough
+cycles can be gained to make it all work.  Note that if $1280 only
+had a few raster lines to spare, this task would be much more difficult
+(if not impossible).
+	Towards the end of the $1280 routine, there are a series of
+subroutine calls.  One of them is a JSR $1E50.  This code has two loops,
+one which copies values from a table at $1900 to a table at $1000, and
+one which ORAs a value into the $1000 table.  Instead of copying
+and then ORAing, why not just combine the two loops?
+E50 	LDY $1FC6
+E53 	LDX #$5C
+E55 	LDA $1900,Y
+E58 	STA $1001,Y
+E5B 	DEY
+E5C 	DEX
+E5D 	BPL $1E55		LDY $1FC6
+E5F 	NOP
+E60 	LDA $1FC1
+E63 	STA $1E70
+E66 	LDA $1FC2
+E69 	STA $1E6F
+E6C 	LDX #$5F		LDX #$5C
+E6E 	LDA $32B6
+E71 	ORA $1000,X		ORA $1900,Y
+E74 	STA $1000,X		STA $1001,X
+E77 	INC $1E6F
+E7A 	BNE $1E8B
+E7C 	INC $1E70
+E7F 	LDA $1E70
+E82 	CMP #$34
+E84 	BNE $1E8B
+E86 	LDA #$30
+E88 	STA $1E70
+E8B 	DEX			DEY
+E8C 	BPL $1E6E		DEX
+E8E 	RTS			BPL $1E6E
+E8F 	BRK 			RTS
+On the right are the patches we added, along with replacing the JSR $1E50
+with JSR $1E5D.  Instead of copying from $1900 to $1000 and then ORAing
+into $1000, it simply ORAs to $1900 and stores it in $1000 ($1001 actually,
+since that's where the first loop stored stuff).  Sharp-eyed readers may
+have noticed that the patch affects $1001-$105D, whereas the second loop
+affected $1000-$105F; doesn't the patch above lose some bytes?  Of course
+it does, what's your point? :).  The ancient hacker technique applies
+here: try it, and if it works, don't touch it and don't ask questions!
+Better than a huge rewrite of self-modifying code.
+Wrapping up:
+	With the above fix in place, and the tune moved to the $1280
+interrupt, the demo finally seems to work great.  All that remains is
+to save it and crunch.  Figuring out which areas of memory are used
+is easy enough, by looking at the disassembler and the initialization
+code (which unpacks the code further).  But after crunching -- uh oh,
+lockup.  A program freeze shows that it is still running, but that the
+interrupts are not occuring.  A glance at the setup code shows that $D012
+is set, but $D011 is never set -- presumably the high bit is set, so the
+interrupt occurs on a nonexistant raster line.  Adding a simple
+	LDA #$1B
+	STA $D011
+to the initialization routine at $1443 fixes that up just fine, and
+the program decrunches successfully.  Woo-hoo!  One page down and
+one to go.
+Slow Ideas, page 2
+------------------
+Reconaissance:
+	Page 2 has essentially four visible parts: a tech-tech at the top of
+the screen, followed by a swinging FALSTAFF sprite on top of rasters and
+open borders, followed by the ubiquitous Pu-239 pic/logo, followed finally
+by a sprite scroll on top of rasters with open side and bottom borders.
+	All this is done with two interrupts, one ($11FE) occuring at
+raster line $31, the other ($1350) occuring at line $D1.  The $31 raster
+performs the tech-tech, the FALSTAFF rasters, and also performs some
+calculations for various sprite scroll effects.  The $D1 raster handles the
+lower rasters and scrolling sprites, and also plays the music, scrolls the
+sprites, and does the calculations for the FALSTAFF sprite.
+	When run, the screen is quite a jumble -- too many rasters -- and
+needless to say, the screen effects need retiming.  The music really sounds
+better at 50Hz, so it needs to be retimed as well.
+Top to bottom:
+	Before fixing the timing problems, the extra cycles need to be
+addressed.  In the $D1 interrupt, after the lower rasters are displayed
+there are a series of subroutine calls at $13B3:
+	$13B3	JSR $211C	;Tune
+		JSR $0F80	;Scroll sprites
+		JSR $1300	;Clear $7Fxx tables (used by JSR $0E00
+				;  in $31 interrupt)
+		JSR $1D00	;FALSTAFF sprite
+Deducing the function of each routine is easy enough -- just BIT it out
+and see what happens.  BITting out the first three subroutines frees up
+enough cycles to make the screen stabilize.  Finding cycles is usually
+more work than fixing up timing -- especially three subroutines worth! --
+so it is enough to $2C-BIT out the three subroutine calls for now and fix
+up the timing first.
+Tech-tech:
+	At the top of the screen is a normal tech-tech, controlled by
+the routine at $11FE.  The code flows roughly as follows:
+	$11FE	Set up $D020/$D021, waste a few cycles to get timing right
+		LDX #$08
+	$1214	LDA $xxxx,Y
+		STA $D018
+		LDA $xxxx,Y
+		STA $D016
+		INY
+		DEX
+		NOP NOP NOP
+		Change VIC bank ($DD00)
+		CPX #$00
+		BEQ $123F
+		NOP
+		NOP
+		NOP
+		...
+		CPY #$2F
+		BCC $1214
+		JMP $125E
+The thing to recognize here is that there are two loops -- on every eighth
+line the BEQ $123F branch is taken.  A simple cycle count shows that the
+first loop takes 63 cycles, and the BEQ branch adds an extra 20 cycles:
+obviously, these are simply timed for the normal and badlines.  So all
+that is needed here is to add 2 cycles to each loop: I changed the
+CPX #$00 above to INX DEX for the first loop, and changed a NOP NOP
+to a CMP ($EA,X) in the $123F branch.  With the raster timing correct,
+the next step is the initial timing at $11FE.  By doing a little rearranging
+of code two bytes can be freed up, giving 2-6 cycles to fiddle with:
+	$11FE	LDY #$0B	LDX #$0B
+		LDA #$01	LDA #$01
+		STA $D019	STA $D019
+		LDX #$02	LDY #$02
+	$1207	DEX		DEY
+		BNE $1207	BNE $1207
+		STY $D020	STX $D020
+		STY $D021	STX $D021
+		LDY #$00	NOP NOP
+		LDX #$08	LDX #$08
+	$1214	tech-tech loop
+As you can see, by using .Y in the delay loop instead of .X, the LDY #$00
+instruction becomes redundant.  As it turns out, just +2 cycles are needed.
+Finally, there is the matter of the last raster line.  When the JMP $125E
+is taken, there is some delay before changing the border and background
+registers, to get a nice solid line.  It turns out that four extra cycles
+are needed here, and fortunately there just happens to be several padding
+bytes before $125E.  By changing the JMP $125E to a JMP $125C, two
+extra NOPs are easily inserted.  That takes care of the tech-tech.
+Falstaff rasters:
+	Immediately after the tech-tech are the rasters and open borders
+behind the FALSTAFF sprite.  JSR $0EA0 handles this part.  We already
+fixed a sideborder routine in the first page, and this one is similar:
+	$0EA0	LDX #$05		;Initial delay
+		DEX
+		BNE *-3
+		LDX #$15
+		CMP #$EA
+	$0EA9	BIT $EA			;Change to NOP NOP for +1 cycle
+		LDA $xxxx,X
+		DEC $D016
+		STA $D021
+		INC $D016
+		STA $D020
+		LDA $xxxx,X
+		STA $D011
+		NOP
+		NOP
+		NOP
+		DEX
+		BNE $0EA9
+		LDX #$02		;Change to LDX #$03 for last line
+		delay loop
+As before, +1 cycle needs to be added to the border loop, which is
+easy enough to do by changing the $0EA9 instruction.  The end delay
+also needs a little change, to make the last line nice and solid
+(there are no sprites active on the last lines).  Finally, the
+initial delay needs to be changed, to line up the routine correctly.
+I changed a CMP #$EA at $1277, just before the JSR $0EA0 call, to
+NOP NOP.  As usual in this type of thing, it was easy to simply
+experiment with the initial timing until the borders opened up.
+Bottom rasters and sprite scroll:
+	Finally, the bottom rasters need fixing up.  Once again, the
+loop looks familiar:
+	  $135D LDY #$FF
+		LDX #$55
+		CMP #$EA
+		NOP
+		NOP
+		NOP
+	  $1366	LDA $7F02,X
+		STA $CF18,Y			;Stretch if necessary
+		STY $D017
+		LDA $1504,X
+		STA $D011
+		LDA $1C32,X
+		DEC $D016			;Open border
+		STA $D021
+		INC $D016
+		DEX
+		BNE $1366
+And once again, we need to add a cycle.  As with page 1, .Y=$FF means
+that the STA $D011 can be changed to a $CF12,Y.  Changing the CMP #$EA
+to EA EA adds the two cycles needed in the init to get everything aligned,
+and poof -- instant open borders (as long as all the sprites are being
+displayed correctly).
+	When the scroll is activated, something else becomes apparent:
+as the sprites scroll into the left border, they "pop" to the left by
+several extra pixels, i.e. the screen coordinates change by too much.
+To fix the sprite popping, all that was needed was to add 8 to $0F00
+(the sprite coordinate).
+Too many cycles:
+	Alas, the moment we've been dreading has arrived.  Where in the
+world do we find enough cycles for three subroutines?  One option is
+to get rid of some effects -- for example, the sprite scroller has
+many time-consuming effects that can occur; by getting rid of those,
+the tune can be placed in the $31 raster.  Working through the code
+doesn't reveal subroutines that can easily be rewritten.  Still, it
+sure would be nice to get all the effects going; but where to get the
+cycles?
+	A little thinking suggests something, though -- the PAL
+routine can't have THAT many extra cycles, simply because it doesn't
+get cycles until the raster routine is finished, at the bottom of
+the screen... oh duh.  A glance at the raster routine shows that
+it covers $55 rasters.  The interrupt takes place on line $D1,
+which means that the rasterbars extend all the way down to line
+$0127 = 294 or so, which is over 30 rasters past the last NTSC line.
+So suddenly there's a great big chunk of cycles, for free:
+		(Orig PAL)	(Fixes)
+	  $135D LDY #$FF
+		LDX #$55	LDX #$35	;Only $35 raster lines
+		CMP #$EA	NOP NOP		;+2 cycles
+		NOP
+		NOP
+		NOP
+	  $1366	LDA $7F02,X	LDA $7F22,X	;Add $20 to compensate for .X
+		STA $CF18,Y
+		STY $D017
+		LDA $1504,X	LDA $1524,X	;Add $20 ...
+		STA $D011	STA $CF12,X	;+1 cycle to open borders
+		LDA $1C32,X	LDA $1C52,X	;Fix probably not needed
+		DEC $D016			;Open border
+		STA $D021
+		INC $D016
+		DEX
+		BNE $1366
+Note that the table offsets need to be increased; without that, the sprites
+start stretching all over the place and everything goes weird.  With $35
+raster lines instead of $55, the rasters go all the way down to the last
+NTSC lines, and free up just enough cycles for two of the $13B3 subroutines
+to be re-enabled:
+	  $13B3 JSR $211C	;Tune
+		JSR $0F80	;Scroll sprites
+		BIT $1300	;Clears $7Fxx tables (sprite scroll)
+		JSR $1D00	;FALSTAFF swing
+That pesky $1300 subroutine just pushes it over the edge, though:
+	$1300	LDX #$2F
+		LDA #$00
+		STA $7F01,X
+		STA $7F2D,X
+		DEX
+		BNE *-9
+It doesn't take *that* many cycles, so one option is to use even less
+than $35 rasters.  But this makes the bottom part look awfully small, and
+obscures parts of the scroll.
+	If you think about it for a moment, there are some free cycles
+still hanging around elsewhere in the code: in the tech-tech routine!
+There are lots of NOPs inside of the tech-tech loops; enough to piggy-back
+the $1300 routine into it without much effort:
+	$11FE	LDX #$0B	LDX #$0B
+		LDA #$01	LSR $D019	;Still 6 cycles, -2 bytes
+		STA $D019	LDY #$02
+		LDY #$02	DEY
+	$1207	DEY		BNE $1205
+		BNE $1207	STA $D020
+		STX $D020	STX $D021
+		STX $D021	LDX #$08
+	$1210	NOP		NOP
+		NOP		NOP
+	$1212	LDX #$08	LDA $1887,Y
+		LDA $1887,Y	STA $D018
+		STA $D018	LDA $1B87,Y
+		LDA $1B87,Y	STA $D016
+		STA $D016	INY
+		INY		ROL		;Replaces LSR LSR ...
+		DEX		ROL
+		NOP		ROL
+		NOP		AND #$03
+		NOP		STA $DD00
+		LSR		LDA #$00
+		LSR		STA $7F01,Y
+		LSR		STA $7F2A,Y
+		LSR		DEX
+		LSR		BEQ $123F	;Every 8th raster
+		LSR
+		STA $DD00
+		INX
+		DEX
+		BEQ $123F
+Using LSR $D019 instead of LDA #$01 STA $D019 saves 2-bytes (6502
+instructions like LSR use a "read-modify-write" cycle, and write #$FF
+to the register while LSR is working, which is why LSR $D019 clears $D019!)
+while still using 6 cycles.  Replacing the LSR LSR ... stuff with the
+ROL code saves another byte.  Moving the DEX down saves two more bytes,
+and getting rid of the NOPs saves three more, for a total of 8 bytes --
+just enough for the LDA STA STA which replaces the JSR $1300 code.
+The new code uses two fewer cycles, however -- but by rearranging the
+initialization code we can just branch to a NOP at $1211 above.
+	Every eighth raster the branch is made to $123F.  This routine
+performs another INY, which has the effect of skipping every eighth entry
+in the $7Fxx tables.  Thus extra STAs need to be added:
+	  $123F	BIT $EA		BIT $EA
+		CMP ($EA,X)	STA $7F02,Y
+		NOP		STA $7F2B,Y
+		NOP
+		...
+	The two STA xxxx,Y each take 5 cycles, so 10 cycles exactly replaces
+the 10 cycles used by CMP ($EA,X) NOP NOP.  But since the STAs take two more
+bytes, the rest of the routine needs to be moved forwards; luckily, there
+are empty padding bytes immediately following the routine, so it moves
+forwards without a hitch...
+	Well, maybe with one hitch.  The whole tech-tech routine uses
+self-modifying code all over the place.  The modifying instructions
+immediately follow the routine, and so need to be adjusted to the
+new locations.
+	Finally, for some strange reason, the above fixes also affect
+the FALSTAFF rasters.  Adding an extra +2 cycles to the intial FALSTAFF
+delay fixes that up, though.
+	So much for extra cycles!
+The Tune:
+	Finally, the tune really does sound better at 50Hz, so a 5/6
+delay needs to be added.  There is some free memory at $09xx which is
+ideal:
+	$09E0	DEC $09FF
+		BEQ *+3
+		JMP $211C	;Play tune
+		LDA #$05	;Skip tune every 6th frame
+		STA $09FF
+		RTS
+As the comments say, it simply skips every sixth frame of the tune, to
+give it an effective play rate of 50Hz.  It works!
+Glitches:
+	Unfortunately, there are still a few glitches in the fix.  The
+bottom rasters have some flicker in the upper right (and occasionally upper-
+left) corners.  Pretty minor stuff, but still annoying.  Fixing means
+rewriting the raster loop.
+	Much worse, though, is that the screen will glitch badly when
+certain sprite scroll effects are activated.  This glitch seems to involve
+quite a lot of rasters, since even with the tune disabled glitches are
+still evident.  One choice is to reduce the number of bottom rasters,
+but that doesn't look very good.  Another is to eliminate effects, but
+that's no good.  A third is to do a big code rewrite -- bleah.  But the
+easiest is to just call it a 95% fix and say "good enough".  Good enough.
+	And there it is -- Our First Fix.  Hope u lik3 1t, d00dZ!  Next
+time, we'll have a look at some of the routines which weren't covered this
+time, such as fastload and FLI routines.  Until then, have fun fixin!
+.......
+....
+..
+.                                    C=H #17
+begin 644 slowideas-ntsc
+M`0@+".\`GC(P-C$```!X+##0Y@&B4[WP")W_`<K0]Z*QO4`(G?8`RM#WH%;*
+MO9I=G8CXBM#VSC$(SBX(B-#M3!4!@`*-``CF^M`"YOM@I/BB`R`O`H7XF*(%
+M("\"(/D`H`"8H@,@+P+%^-#L(!$"A2U*T$`@)0)*D$@@)0)*D,[((!$"A2W)
+M@)`+H@$@)@*%+2`1`J@@$0*JO3("X""0!HJB`R`F`J8MZ"#Y`,K0^HC0]_"L
+M(!$"R?_P).D`H@`@+P*%+B`C`F7ZIBV%+:7[Y2Z%+NBQ+2#Y`,C*T/?PTZDW
+MA0$L,-"E^H4MI?N%+EA,0Q1(K3&D[@("T`/N`P(JA?=H8.B*!O?0`R```I`2
+MZ.`(T/+P"Z('Z`;WT`,@``(JRM#U&&``_R`#`00,$`(?,*#P"`]`@`8'L`HD
+M8"[@&3]*<+;`3V_UA_)G\!>0DGS^DDZ/S]-W3I-]^)NJ8#A@0$H_UAG0"VQE
+MA\`/P&)M)PGCPIN1\?/CXG?OA&4_/R+Y-$\^M-A/)/?WXG`"!@X>/IB>?>GO
+MX)43(8#@"C]P&`/QF9FQTXQV!G;%7Q#YB/_G3C/&1\8ST[J$A$VD]^%-E-A-
+MXT*"%9-Z462:(>$>B/!-$<R/)'<CJ3WZ_S]3D_D0">R:<<"GA@>'PJ:`,//W
+MY5\)H!@*P7,-//7S]U64H__Y%2MSD5?Z.]9("=`8,&!"=?R,7X5>@2<ING@F
+MX>$!P)J'J"0Y34Y!TYKT%!3T5)G]!@+PG0>,6%.G7@,!G(F\0>GQ,$!#IZ&R
+M>])HMDA9EP@33HP<$]]U3DM2P"`P&`P",!(SX0=_P!<IWZ)"44G`GH$K#?0:
+MIS7(F\W.`9D(J_T2-/04]/83$>`<@9\?`0,>```IBQ/CHB(A6/@Q,2$@<)TF
+MRLWD3=5DW)N2R^$%7GYK,X%LK(WDZ?C\/`TZ0Z++H^/#`3P`0=9>'AP8:PX%
+M9#)TI$):>"-9/0!6#N/OZ<C^?P'PGF)P@3@%50G,IP$9Z<[;J<!TC/R&<M,=
+M.83I2^2U,14V*P-_#Y82A^^."RZ/@\6#)_#X.G@!@\.LNC/6F``K!+@.#P>$
+MZ!Q\^>%ET+4]/Q`VGI`'ZE?TWC38_OY4?((DP?4,XZN1G2I2@0*$2R:'!7/"
+MP`%>\5[ZAUK#KF+L?5Y7<K/C\MHU3\<'P>'J0E)4_^[`#7A'9V9F9V=CX]KQ
+MLQV9]-RU95O(<#@Y7C;A:Q['_30KRN%TX5N=TP0&M,$`/K).W$JZ,5H0M(]H
+MH'(*>JHA^N&TPK5PNR/<E*M+,08<??GA3*KT'*];KJC"<&%ZD*G%=S6\\.NX
+MIAW<'<.ZJ!__EL2J+5%"E;!P='BB"<>%@T.+"D\'UD8:\=[$37OF`5YY4JJ]
+M0!^:?])C^KS'\^H,9]/^/H[O]UE_^*]P)N\>]H2LR6T%1R\LDM/ZNZ#/@!_Z
+M]>$`KWNQ&G!^/UE<V.4LJ\G__HIAP\>7&.ZS)(>6'JE9!T)+8V?EEO+9.40:
+MWDKO6ZCB.6V$MC=A^?V=VY3+1HX61;MRA`]@`KLMMY2%KRLH*?VMBS4_<9%:
+M[BZAI`UI=:LK&H!W5[[,@%'`_K+F+599+BW$+:V."EI=P,K+;F8<!<W,M>%B
+MEN'131>*3-CHMLI;$P?H063$MK=1&_/"ZQ!UQ;AL'[,#MR:J==$.Y/]QJ2^&
+M"O0/!*\LA`KRN[I,!!7D8"%QHX!JV_B,#!CQL>E<4];ELGS&_8#Y<:T1\!/Z
+MXPP:'C+8D<!CP@>'<-B,&!BML^&*W-@.*X/V!^!ORL!JX#+G.%6\0'8L_"FU
+M3.L;C'C8T9#UQQF9FQVL`S=ORJX,*ZM/E;SX&/8L*2E++S9F?,QZXIF<8-:V
+M/`9\:>U<?D!\#'@"6:+PZN!FZP1>,13_W0!\28^0CH'>H00)X14_W<`CI$X1
+M7U!M-?A7.%MT1`(!$0YK#&N>&,S;#Y`P4)$[]9:F*0M9`.9F8T77FL-C;S-F
+M:Q%-C9CL##8XKJ$(!)R$_95&LDA+R`OZNB"Y?L!"@=72P0H+"@]2`0H%"@E2
+M"0H-4F4*)$$(%U)1"B=`!]#\!/R$F)P(/#6(,8PL$"P0J!2DFJ">GJ"=XAGD
+M5&A0;=`NSS/)]\LVPM#@$N#"EC^"\?WV&?87^8_`S'Y_X_`__'.\?[R'+_+]
+M$_P4&Q0;]1KV"2>H!VCG-]<WYV8.++6V+BVM\_?<7W[/W\WG]OCO[(""@\'#
+MPH"``8.`PT##@8`"PSQX!@6`!(*!@X&"A/?O"@F)"(@+BPK_R[XK0#<+0'<*
+M1#0Z=FGBI@8T-W*D`C1K0J>J-%0.I,8T4`W'B,:C/-G<-QHBS["(T69]BK.Y
+MGV`QHSH+%$/WO`(SH`"90A[IC/INFG@@212B`+T`5&=6!T:'OW6PCW7(C6L_
+M<G`T.IG5"3K8ZZLQG5GLLN\@3A18:8&11I"@+K-T&<80ZXPP$[@D$9QU)NSI
+M6$P-"&<`"Z;VG]TR_%$@4`E&B6A',:(QPP?[IH8P$&@00`F0``JVHH0*`\02
+M`\S):8!%O@J1%PCLXG'**EA@*5K,/)&B3Y9X#R9`]3>!`SK=>K?H5,M&@&[0
+M?YB`=751!.5H?M$OL51!R?V<+:$:0Z'<SZO6F`0F1C/C!=GN`.I7*?A<,:SD
+M<</<R;RA##>1XQ-'Z($<6=3+N*.I86C#S*!B(P[T!-P?$"ZS]`!-AR)-G=A#
+M^@']V"/UK/W)--`%J3"-9[]V,S]RF=(16E(FD@,:S[KF>;(H6C<!G5BN/<TC
+M<<?BAAI)'*I5ZY"XO7)1>98_&YX_0`]R0#0P\E`T4O\RCWB!#QK3!'BVYP^C
+MQC4>[`GW43_>P`VI!Q%\.3L`CZ@$?0`"H]!U'HCZ@$?0`#(+-^BH+T_&IW_U
+MT[[)T_V;W?1YS^PL//?V4#?&NJMH@:M0I%)_47(`,SO!L3H1LSI@C]2>&?L`
+MS]J&:0AT/N?ZQAG18!!H#U8PVHLAU\,.:+0<?#VV*DVA0&I<S75SMSVH"9>!
+M`#3;RC""N$2!3%$P`$I;,,$PZ9B2B8^#\?Q`$`?W[]_7Q[>OGY>'?V]G5T\_
+M-R<?#P;^[N;>UL:^MJZFEHZ&?G9N9EY>5DY&/CXV+BXF)AX>%#_C[?8,CNAT
+MPY8<.-#;R,G*AHYF;G9^AHZ6IJZVOL;6WN;N_P</'R<W/T]79V]_AY>?K[?'
+MU]_O\`24?$16I-8HT2%9D'5H\:]T._0`^Z'?D@$CM(!`5,(%N3_Z`-)C]3"`
+M52"E@2I7_5-!2/Q+N6/J.Y/J4@$$#1QVYX_,H^;`K&&T'&E'NFHJ,-,!,B`U
+M6YB.@8&EB]^:LWO4N[6YR.LH4NOC*GZB!XH*J+U1>](78`:/L$"BX0T)D`T(
+M3_O$:]S`"B.X.I/^9`="]1:3OP#WJ+2=)]#*$,Q0="^A&CFA'#N@P)_-)?J\
+MP,3(S-#2UMK>XN;H[/#T]OK_YM\AH>)BHR.CY&2E)66F)F;G)V>H*&BHZ2EI
+MZBIJJNKK*VNK^WDL;*RL[.TM+6U[ON[Z_6_KNYT<Y.<'-RQL;"OKZZMK*RKJ
+MJFHIZ:EI*2CHJ"?GIV<FYJ9F)>6E).2D9".C8R+BHB'AH6$@H&`?WY\>WIY>
+M'=U='-R<7!O;FUL:VII:&=F961C8F'MYU]>75U<6UI:65E85U=65E555.?ZG
+MS_4N?_>OE%0^W_]/#N_*7C?\;/-15YGJW,I8Y_\PN7WFV#G_RS7EY?7^5+#S
+M_8F)CY_LG/]EY_LW/_N_,_>OFAI^W_]/#QS>-_QL\U.9[%\&OPU<M%OX-?&W
+MCAQT4$].34Q+2DE(1T9%1$)!0#\^/3LZ.3@W-30S,C$O+BTL*RDH)R8E)"(A
+M(!\>'1P;&AGS>E\[R0\.#0T,"^WE">W;MX^WB[?`O;7=QWUX]W';7.3G!S<\
+M\#>%!88&AP<'B`B)"@J+"XP,C8X.CQ`0D9(2DY05%9:7F!D9FIN<'1X?'Z"A
+MHJ,D)28G*"BIJJNLK?B]]?O/[]'[\/S\7SX@/M]^/P#^7]]/WX/'IB[Q@"^7
+MX!?;^^G[Z_/7UW[>('^KV^V%]K/Z?OTEOQ0WV=?DR?9:[Z:]''Y"4%_%]]P%
+M\_@L,\?O/SN;#_'[S\[G<!]?O=S\O;X=?PX?6S\,_QC]T!=NW;R]L\_.?`%W
+M0-S^!@[G]S]`PX#[N?'SZ\N?WCG_C:'FOW@Z%>/WGYW/*#^P,\?H.`^OWNY^
+M7MZZ_7(%?#9^&?XQ_\1(>J'UF<P?E8<^?7E]+\9,DMD?9:OR*G"$RQ%7G>0[
+MWT[WXN_>ZW\_+]?Y[F>_E_-,_#]A9W['SW,]_>N:9^'[RSLWA;&?A^^O_._>
+M>>?A,?W%L9^'Y67\[\NYY^'=R;&?A^]\\3/7+<6QGY?OC>%L9^7[YWA'&?Q<
+M@0'FH("Y^F@P/G>P2%9L9J%AF:AH<'YZH@(Y\Z)">SGAT;`UIG=(!X@'R$BT
+MSM$Q.4%)`5%986D!<7F!B98V"`S-#4@-M6P%FYN&:-C%@'G!N<X^CH[(#P]/
+MC\@0$%"0T31L(J,C[#82$G-4I+3$U.3U!.45)3(%15SS597H%A9T;"TMKFPV
+M%U>7V#/586)C9&5F9VAI:FO.5LH&W-<;#<WN#CJV'(@<W1U=G9WSE>'EZ>WQ
+M]?G\Z@('.6"@X35L0L-#Q&RV(F*<HMSC(V.CY"1DI/3:A+S!!,D$UKM,FY_7
+M:7C_I:;5L81]%93U$=4U6SL_L;(@LR"TH+6VMZ"XSW<V+FZNR"\O;Z_SER0(
+M[!PL/$4XUV1DIKE99!F9MLN+83\USL\@T-$@TEQ;&FL<7[94XOI!N[RG%[XN
+M/DY>;GZ.GJZ^SM[N]?`6":^'CY>9`OA-CT]?;W7%L?"?Q<_-`#T0=<_`^,-N
+M`/J<9:=`-"I&)T!T,K*J0V9)]"($.2B_U.-(.Z$<IQI!='("QI*KU[29'V+T
+M4!0"219H#Y?<7T^ZJS&-Z5$:X#U;A$^F\>HE^M4"C`9,YLX"(=Q/(SI$F.<G
+MZ==(DGO$&<Q1@)140?.L`A.N!/V`)^R!/V@)^V`@3FZZW%)>WY+T4K,,?J?^
+M9)O97@&3!$I,&*CN)U)EH/1K"(&/U@`([;)N!%R`"!13X$2Q"9)\9!,7Q8"_
+M4H@`8?L!!?L$G[A$_8`G[`D_8(GZ@$_4$GZA$_0`GZ`D_0(G[@(!-PG[A$_8
+M`G[`D_8(GZ6\!-PGZA$_0`GZ`D_0)`Z"?_`4?Z%@22-[2*!.D_2?2"3Z0<K0
+M\:U;'S.1K&D^JUA.'N3Z.#;4OACMJ7TQXT`>4P*@^0`#K]B-#TZNISLX40^R
+MYBJ`'F?@_&&FD!B!#W9]!%56T3PP6?62R8XTS@=D'`F?B`FR`=I:<,R($#-@
+M$^/9!=[WK2`KS(/\_H@>IH)T0&D.`U.A`UE"'DJ@A@P22&!XD*@`_@!3P^@?
+M_HL__P/__X4?K\^?CX</G\C^315^^?_/*/UY_SQ1VGP`C?`?A1K*-!1E*,3X
+M1<)AXL/U^6'[_JOT!1R*/U&]@*/T'Y^!1^A^'X*/U&,C#53GXH_4:C\H_87,
+M1L(_<!'[P(_?!<G[P++KP`%3QKCPC]/$?HXN3]4>/PFG@FG!"KH0.=%)A_#_
+MS^#^C#31#Q^(_?!8?2/11W/ZV'US(`H//\`>(\T$/P@;_$?7XCT_D=W\>)H\
+MIIYK#Z4>B.Y5]/*L]/Y8;GXCN^%AE0"NQ<`17IB=B@(0YIV)BJ)`?Q5]`:KZ
+M3I5T(B``'3L4;(`?IV4JTZEE@`Z\K``%P7J$T/@%7T#H_1]8>]`#\_\/\?\?
+M\/\_\"C]?D-9\C]'D?H<C]!D?J-Y'$NS]?^?\4?JDUA_G\$8F`0(J#]>-^J%
+MKQ0%(U4!Z%G^CU)+KRMD_/`.<X`<YP`YS@!SG`%E^J/11W*.@!SG``YSA9?J
+MCT7D:U2@@CD28XCU1MCB-(XX%I^H]%7<CH5<0#CJ/UQP<?[XX[\]QR.+CA'2
+MBT^T47O@`''`8;P`,'@`<#P`X"':`'@<`/`X`>$97@C$>$7QXBX'\!5]*/15
+MW#BKB5;B-91H*LI&(&#*8=IC^F?R9V)F2F3ZP\?=,YDS<20`"YP%9,*Q>CG[
+MOG/[?/?R?>_P__G@_^/A_X?`@1Q`:)\'^$2RD\T;::0*:<Z?RHDE(P`]&`/(
+MP#Q&`<(P#Q$J//_`'C_@#A_X'L?\/</_.<#^A5/_!X'_@\!_X<`?\\1.(24>
+M")!1%(^@.H6L0#\M8B5<1_^B_S_`?A_@_`_Q_@?X_P/P?X?@/\T5JM#,`!61
+MF_+#*1B;#FCJ0H?"/K\.?]%"C]4?2/3^1W?RP[OA'=P'"8X)G`L/I1Z#B8N;
+MN)`#\5IHL8%>`A__`Q1^O&?\/$?\?@/^?Q.^?[DC]\#Q_^4?OW.'__/(_=8?
+M_^,/_\-&F`HQTZ`T3'@B2_'-5#?XXFBCZ<UB7RKNS_P<A_`X?$3!\B2'T?0?
+M`'`/@#@&AN`?!$-\(@'!'T0/RG;?#X^.']_>/?W\>/CX3OM3#!,/U<-/`@,^
+M%C&*&)@<H_5'^FY#Q_?WCW]_#CX^#`F]-2K$]&[_##_^#!_\#`_X*1Q&$30P
+M!P`P`IT,+;]3%Y)+!2N$M?!?H>L/T>?T82F0937Z'J&50!ID:IB>6,3:L\A^
+M!^%'ZLZU5G62LZO5=6*U7Q;U,GBU$"E3X1./E&#^H_7S\/WY'Z?/R)5'Z/GZ
+M)*__`^?15[_P#YZ*OO\`-15^_``\>`H_0#!0=(O<`\`>!%=@(OX!_@0`_PD/
+M7(A4R$)<@AA)$Q).5QX`)_``1_@`@_P!`?X"`/\"`'\#`#X(@>!&(D@$R<BI
+M`6A4N-FUI^WZ/V;1^S*ZOVY1^_JU,!']P_'CY_'WY^%_#\`^'\`<'X"N28_4
+M?JP&/Y77T`_"'11^^"&O@<$;J)G=1)S^^?Q\>?IO&"+,:GX\3OS\C[/#S_/!
+MC_&!G_F%\;<`<&`P&/!X/(\C]X_QXQ_XQ?/Q@`:/E<HVI(2^AL`($X`<.`\^
+M?/4?OCAQZE@%Y'Z@!#UC,X/HS#Y_@%J2"ANL$^P!@%7ZI4('@#@/6$DH^E'H
+MK3!5RI]Q.G]XA`?$S^7L?KXMOTX5?J='"XNYF`.'`4``'#H3,='USX"/U5XI
+M_)4`/GP"C&1F+B_5F3:C]7J=RK]4Q^^?`*@,&`++]3@?5V'_CRCT84?KPF(]
+M%8GHY3Q'Z&++]3#Q'7\*.A<&4P+`=.`\`<!^`X#G!P#!%.<XZ+>.``\<6+W/
+M"K]5]X+XHM^.<1^ABKZX4>G!X*TQ^`!6V:S8Q=[0K&8]7C$*/V933#33G5?3
+M_A_RM:X'\'P#X!D0,`KI[^%A^O"Y:9D_D`I7C:AZ8/+!2.%5TJO?!>-^K(T?
+M%AK(T%64C$`U&"!X``@P6&\X+9Z/%XWZLL!0K>"K01E*H4^3[`.#@`''3.Y5
+M,`ZLZ%7ZJC$=R.A5Q)]E4GW'RP_1Q5^@*,99?JR\5IRI]42G\HN%23Q<9+?R
+M?G*D9;?J?J1J?TRY9C6A;-[PNDJ587+W.E7G4JP`5RL<".3`#P'P#@/P'@)3
+MELG"\[]1AO`<@(1=+CKT9%>%*.*GL#6>:J>+Y#]6.%NB`?U8L*GV!15`"P_6
+M#P`^'@`<-><;F`/@/"(#X6'[X!2,962H<CF0\<`<P(NL%8"BO3C6!`/IBX5>
+M5JWP`RC9`4":L:$`_\`!*QQ0U;2`"P_5><FIET4?J'8'1^AR/T<5?HP/K&H]
+M':F+;]5Z.P_@%@^L9VT+*&T`<$?K`M.5%(CKX!Q<<=KPL<VEX/%A^GP`N/5]
+MRD`:!)7ADH_4%*O%=3%97K#`'X'P0">"C)7-^CRC#`5KA+*2`2IKI`63SIR4
+M)'"F'8&,"S?_=3#'[B8!]<2]?+U?O];U42"Q>6[#Z2MD)[E^T]+I2`\#3H`+
+M719`F\'[9F)IOWB@`RZ8%AZ@*?MK5.+P?MDZZ:ML6*;AQ673]#+9Z4`[H`8`
+M#7Y5X84AR^YUHM^FP!ZL#KKF;(-8,7C@RNIC7BX;K5CX7&4E\N&&/W$!@>XZ
+M!`#X*=?D7AQ`+Y'C7S(EZK&^%,0%\ANN2N!U@'>.,#_\+K%ZP\`'9LDQ[UQA
+MN5W/&]9Q"$`X4_I!@JX%"P0.?V`8:!!S^ZX!#G]QW_C?WKF];W-CUI<S.9'K
+M"Y^L$@H4'COL##`\<>AXV^+_\B_]<WK>YL>M+F9S(\8,!_WJ`\?\AS_[=.YY
+M.Y[^[]H=WT]^3KZY_6_XV/&ES-\9'C"Y`=SSJ"AW/!W.?=O=X=^1ZY^[?&QX
+MTN9OC([GM_[Y`>('Q">(?]$7ZE_$WR=\4'*/Q2^JG]%KXN?%YZO__>L#_OX`
+M>,!P]7OJ[]?_>?!WGGWJ]\.'?R'X,[O5^5C\J7X37PD/A#%A<8&1H;>^WF0$
+M)#14=(2DOWT'WGH/_/=IJWV_U7\/:7):4D(Z*AH2`?'AT<&QH9&!<6%).2D9
+M"/CHV,BXJ*"0@'AH9Y[4D'!@4$`^W'OAGGJ!@@+YZ5T)"@P-#Q#SVKPN,C8Z
+M/D)&2DY25_]\@(*FP.F!OX%$PP/%R`)9>@(F6$`@`LP"`Q;S:0##P0)#@<ZX
+MB`!#`(%<=#BUI0<2`08)&#H@?-UTX:*B82!NNP*9W!X*'$`H"ACKHJF?`P*"
+M;XM0L!F?`F*,P`9G=GMYM,*,2A-D()O:S%Q()%'$:6-J$`(L"D`R'BI`"L^!
+M)BZ"O"J`P@M=?9`"!X<SC%V:K/83$'&<"B9^R&8^LP6&X>%LM0(/9FD)"VH"
+M'`A`)`H:S^<4I"]IX/9!DP@*EID9G),0!H"P+9!L[&%XDS#=9P2\KEF6P:!6
+M?+*$!%'7"S(`P%ETR5#R`#8!MN`RQB`A44>#(%ASLQF4H:0,]R`'9!LD&(9`
+M\S>;>AP>!H:S4HWQE#7ECTK&"`*,`(&*!@RP[=>`N"O(19?.#T%!A1U=,T`T
+M/+1</$@P$9;@+KI`7#.5.IPJ>5FSJ8N/$174L\&YER\NP4"6VV2XD,N2W<<C
+M+UZ3;%!EH\$H3`1D_/VU&043(7E]T_CQMY9+=M0Q<!7%RQ;#B=R:(JS!!A5Q
+M8.3N!78RLB`QG,\SL%1-XF$RC$W7B'O,W6-[-QCRP\!7$2\BX!#0D'`2UG`'
+M"C@-$WF8;TGF9[K*$&>;/@.`0QQU-MA[ET""#<M8`4CAX9IN@@99\AXRSO(&
+M8@!2B;)$$8&,@Q1Y>%&&LF&4`FXD6>#S`X-9I51.]I(GHF\7'Y*AR+(0'B#S
+M:UA\%L]WBPSLFBQ]QF`D`<"O,3BYQE/9B#!EUFAP6\P]=K4,O,#75X\,@4#W
+MF'+HK=:H.ZN'74Q,8G.QX90/-H'0.$./,H>9+!H"Y?AP"N'AX?!:\4A=;SS-
+M-X:C"49%V;#;T%+&P3<)H2\I4"=QT"(JYC]=?3E3`QC(R^<J\R:%01G@]7;=
+M-`D"/*S>I\L^6!KE505U3A[IY?`&/!!!'4JVKS;9F&Z=QEF6WHWJX7"VK\>*
+MH%&'K!XZD60("".-MTFO%>GO/\]S/$WE+MZL'X\3?"U3VU2]Y4[EI@=B@\8G
+MC(V+C$R9^G)P<LJC$U+C`S;M@FU=F&SN>\UKVP'O-67.8<`AG"]+(7=$KAB<
+MS9.V8`64KH[K79#,%@$29`O-/`%Q5@H6[$-AZR_!!42=A2ZGGG.6\M1YARZ+
+MGRC@`N$]ZS>/>"!V%#+NE,3$K?"P;BX<4@&>FQHR.*`(9>Y8/%-V9Y0Z^&7+
+M,4*#QEX:YB7F;I9`7AB%)2VM*Z3P=W+"^2X9,"H$&`K'P48I.&T9<6Q8^$\G
+M689/DA7C#$\H038N!+F!!F\Z3Q,5;8!1-W`*SG$A5A72FCF'L89W.-7D>M&3
+MFFH#9S`@J)ECV7'20[K;#@OB>5T\:GF2!49&IL0!82S:9-.:4W1\S<#01C&%
+MN+)[7;P<F/X&6F1@ZN7N>SV*7E8"[LEV?H@S]9DAPH\WRI0P!"!1RF`>(#):
+M[M!8]&Y<:/%XB"N?A*A`^4K7HMN9B7$B[LE8!"!:&O.1AU3#8V'`H6'L6@FJ
+M=T^B8J&LY!)R-@H[N=%7>&RP[>=!X*&AC/@G92"#'4#!6YZ"EO%P^1RACE\*
+M-Z&M4ZA:=[.]4X)7'NP\>;8O"#+'JVKP&0X/>8QLY'4<VV7/2)!@ERVK.<.%
+M7):E!^4O&`UC32/0L94L^V<C*IT,/:R<7KMV7F"K.-W16\I!S-HSQ8G$%:]0
+M%9`5CT,ZIX8%4\0T9\O2M?(QFCV9&'KF=MBH[MY>^!XP->:?N%('"'7J](84
+M;=W)R&KVQGU.[K/-R!YX*8F7/6#5ZOB[K0>LNK@&#W,1+`-FLXR&=`:Y@_<,
+MCJS><$[J?,2\R=LXXT@TN?#@\!+YZ07LJO&?L5=.[N3QU`AXD,N-1GJ[:]P2
+MLEW$<*N9VP9BJZ`2QW5D[8$3>5N)B#D#G2&[Q774<[FL#'C%`ZA8/AP6U?CR
+M;NV>=1;?"Y'#1.($2<W5LDW,)(HPG>:JA#O&.=TQO694/=V?.RU+'APE3P="
+MIY/!8['@A;Q6\0`>OJI3:/_V?#?WQ.\KOW\`#Y:W`#UG`#V7\>[/CWE\>ZWC
+MWQQ^@.`&PWL.Z?QW+FK<Y_#=$73YQH[K\#Y5O*ODF7.0);-85G-EJ>ZS+Y[*
+M_@>LSX'_)8?^@9\FPW#E6Z<K^'K=GDS[T8&`?__A\OKYRC\YJ!ZYLEY9ZU7X
+M93N'``];*0E&\]?+'%3N_-)?7,!O;.&?G/9]_*=U?!5;UP!UH'D^L&`WR'#/
+MJ&LOL'R_S3N-*PVP``YI^FM/WVP`/9SZT#6&6#8!9#E%E#ZTOK]6!X5E1@6.
+M!8YEH49R2RYH#*YDQK,M'`[]%-@%\D.7P8!V_3$KG+?+FL^L&R^P'+?`&H^H
+M&D@`Y+Y0TUW;)`'E`RT/1@%IX;%[?\!1\MZCZR[+[+<M\,ZSX"D]Q<GN_ZS_
+MX;)(OA(X8!/K1)^\I/?6)]$T2^+G+___ZSX]3"M6N3)+'[,5U2`43?.\=]EE
+M"'U4_\:(W\:)8_1._T=8%HZYWFQ^]./WUT<7C&FE^P)%2`Z/EHJKV*M8U,KK
+MV,Q])M=0`MS'6LS-6L0(>][#8[\O^^O8CGT46)`$UU$+\T9T`U.6GC[Y?]&=
+M2/QHP8\XT7U#$N+:C!@%NA9_O`9_@,```0,P?]2!&IVM)^9(%X`^2`Z`(F"=
+M<P=Q&[N(W=Q&XCAG%.=^(`U:R@C6?$-?WO`SX&+^HK6>F:SNT!2]9`1O2QOG
+M6?757;S$XP!DSIL$P8L;QA[<\UMD_W@)9/_H"SAW9RO(J3.6N>$BF-,==&D]
+M^4P'@'5KCY'YT4O_[C=<;KR0'EUG\Z;/BZPJHN:"UA;VYH6?=\XL02_O')O,
+M7;&]A;@I\,GPT!^E9\,1A!P$(#,MG+D.N=,"[`@I[!P$N/IQF,YZ5UJ<&@2.
+M-H$>FU.H[KW=>CCS&&GUCS.HE_$.L-QJ<&3@:`4X^&,7Q?.+UX837AH#GCX#
+MZJ]/K:-7;&-:>:J)SW7N>[58TU_\MS`R8#0)O'P?G'1229[#6.<.T.-=I)\G
+M><NPGDF*-?>]AL&B>IJ(,@H0=S+2=3+](G0&_N2BFUG4\G9]DX/Z^K!B_F0'
+M4:)D`U+TA[H(UZ1S+GEJHX7O[U\MID%)UOI/H&)]/ULD7KT8_'NV([SO>3(*
+M65+:S(%J3H/?M$Z&WZ$Z9`T>@5?J`8O4=Y8J"U)?(0E(PB2DY$>MZ;7_@"[R
+M/L0`U9&5UHLSA1P^'#"W#!`EG]&D7_`EO_X^N7$'\0?1!U$'-1,&__C2`W("
+MT0)Q`C02`G_HYDF.(07'(>2!TSQ\#O_O6$O@1/)283,82GP2F?\:$6+@+./U
+MG+DB%@"WK%C0=?93_.%&]<+0AX![C*T`IRT3E>`-T4/3`:?[:%LKF`9]V=[Z
+ML)/*^N8%GK:.-$#3%BRI6>X)X%&\1NU;4K:=9\4U_>,^\BY9Z##3ZZTWN5#S
+M*RELO!+O01R7%XRFOMJVC@]F!9%(?&C+CG*6G!N^QR8(U?NMXDO'$QFVAQ\L
+MPHT_9A%5K\LOXT^O\YFJ8!B<SX:;F=B9@6+B`O1UD,%;Q`S/&?YQ["G\SVO6
+M7^ZSZL`F=9][S.UQRNK9Q='CI\C3UUP)51$&`[>B+!D$D#*M9T'':RR<<-:S
+MD[64SC>3`BRO&9+-T6<<@PUV_;D;?ER%=EQ<\='@MG0'UT_-UJ;X)8,F(MFF
+M$]G>H_)[V2,)91NG9>//@]Y7(8?>FF^W>R^L-QII!''S(;%YZWUZB7]4<.&7
+M]<,0?9N&+]<+XU9T1Q0=-@,T0Q0--*3_N/&M9RN/AGWW,_=`#[F>$8',_1FS
+MC@VSK-]S#F,,L8@0^XU*8KT0@SR+F;A<S>*K`%S6=\@"=K/LXRM#K>76%.NT
+M(>UVX4&(3,0F8>.C:]8?M1(!+1"!I*/O"!\B3ACME;\LQIEPS&`*EL_#;*0>
+M2\W2#PW6*V2),VX%)K=V[-HDAI>V>1([(FW7>'R4M/^Z(LB@-,E&>3]#Q"GA
+MH"#(9E2S^*?!Q[7I??Z`U>@[D:&>CF+V]"K_I9$L^\D(L#!DSH%+.9R-3VYF
+MFEX<P;(.9&?`-<=M:R`>,2V@!.%4O<KDY@_<9[F'-QIN8>7*H`/@=,Z8XU<<
+MCE_:`)R?^0!J`$J8#0+8C9B*ZP#\`&T/:@`+EGJF)5HF*%I$.O^]*61K1R<.
+MXF7V$QM`(J?^=)VF:,:4W'PL&T[,SU7OY:QL9:1KY:7OZ3WC2>[Z3W_B>\F3
+MZ_R?4I:5H(0:AS&`;8@/L"N&HN>>'F(>(N17G24>EKRU)1Y&T-^9-)E_YGY^
+M2%S;F03B%SO]R^)_N7+*,`-,O:Y@,%%F>Y&IP]WH:"&#A:?:/,]B-3AKO@,'
+M"T^T>8[4>G"K'?I8>%H]X\PVH].&._2P\/1[Q]@J]1Z<,=^EAX:BWS[!:SPY
+M8[Y+#PQ%OWV#UGESQWR7'AB+?NMH&K//FC_@NSX9SUQSQC.Z!=G^>?`<(V/Y
+MYZW#^>>1,5%A<9&AS9_@9DJ+"\R-3@[/T-'2T]465YD:G!W?H:.EI^HL[W)H
+MU.'N_0(%;9CI2UM>;3:^K:WN+2WNNYY/EZ05=IW`%<GMSK7('=#ETHW3-=/E
+MU(W2E=N5W*7=K<85Q7W>]\/Y6O%^\8[QSO'>\>[R-O)*\E[R<O*"\N;PNO>?
+MV985][TZ7O[>A%\27QO?--5],WUY?7MALV`);"ML2VQE;!\"@H+?3P&'Y]_/
+MNWR^]@MQOM\/"2?P"0$@X.$CK\2DY25EI>;S[@(``P=GX"./P%"''X$N/`&<
+M?`.#!0%0`D`#(`.SX4`1`!C`(#0+`!*`04&]A1!`)EL@0069\;)@`(0`#((5
+MG>"D)@($@@F^`*;/@(9X9G@`"3KHS(!#$V5R"$9GBAH3KH9L(5&SX$0C8$&P
+M0>'P`N/AG1FD"%!@+<-JQP;<&P.@!06%XQ%`01F4"`,<`N/MU^!:$$!+R_7W
+M?T@^[OC`V.#7ZWY(#0\-CPXX_G\=\1+/>+9^Q@['#KXO9^!=S(<'R@=+A[P^
+M0^5>7L]/F&3R:&0R:!0WXGU_P`J+#`+&-WC(@SCY!I$"C`:.9\R8%'`SR^D-
+MU_`3/R0=?+=X-O5+L_9![?46_&^HEQ]Q@S'/@^)!\F9)$A+&DD@2O#Y)8YY>
+MST^6+!.AD[&LGAGX'TQ9!6+:&`@<<!@`6V\85(!0F)PCPEGE]2`^0OF\88CD
+MBB&*2HI*W7Y_Y,$`83<WE.OIZ>>2S[>7@SO\<3!A&E`D@DWFA%'W".A(J^E7
+MBN3K<4U@B"$<A^"3<EA@0@9A00QE0B/>""@=1;0@5!!BP\A887-Y!ZJ:A!A'
+M4HEQ$31]J);#L"2,)D5Y-8<"2`3$\85P2&-1[%99$%).FZ(]U'D20Q85PA2C
+MK($>D,8&4(9PR/8S-?`S\#!5\0BKW79W&_@<02K[4?AJO0^$:1DC0\"[P*_`
+MJ\"GP)_`D\"+P'_`>\!WP&_`6\!C_P4$BX!1U5G^MO!>)U_^3$)]4(F02=+,
+MIA99G0#A62N+S!59MA08LO,,6W,CZ1T!2D[,2P,(@R,Z@P0$E9<4P2/Q`2J<
+M"M'L5NI7"%:/=1[(]U'^HZG,G)X(GA">**<9#OX$R@W1UX:D\$82YOR$,C!2
+M4Z-I'Y*(\2HKAE)M&<0CV,5)94B/DV-T9Z2(='AXL/N$/C]'<>'T$CW1\H\8
+M(ZE$<:B]59\,9%R(2@1Z(<EI0(:EM\02*$N4*:'=5^+"=A$=9F@\6&.7+"=1
+M;&IN>(_5E]*)N"/CPZA$=1JJH$:<$L,!#`3$0_1$G_XX&%$)Y^&(06$1,66G
+MHI;481^C8T+?ZEDPP/@A&$$P=258L%BZV\8)1H00F)"(?_Y,(X@&"292/Q2,
+M2/15\JO-6)",.&"!2"`P5-VA!0*#(8."H@%13HNT>H=#!X9"!`>'!@1#(\5*
+MGIN\(CT"O_#B"Q`2J"S)29Q(9J3.,H@N*80K+#`S.#<U,Z1!`<(%SI'Z>'""
+M4\1O)(2":$4$T">*.I1^EBC])U'Z2*/T@4?HVH_1A1^BB-,R+E'Z4J/TF4?I
+M$H_1U1^C2C]%E'Z(__CAX"$C&H#!#^H_%,>984INA70(?#<S1D_]!.([*!1T
+M>/HK/48!X(*1\C"D]`=2Q[DK8IU5?;@J^W-0$J86\I6?:;EDH^A]'CE+#T48
+M6'4<"3`5%<020A/$%<8P;)RF()X0F9^4G`8$A,\N4PA&")G&.P_B&`I@P&@C
+M`ND"N$9Q,U7U<#DXC)QZF#D%"TNJ'AX&\(:NN40<8IRX]3)XOS_Y##1RE182
+M,E67)88%D@SK+"YF24NW0I';HP)P:F9@7ES!%U^$Y(1D3K\(!T;&'7X+"@B'
+MB(HY?L=(")U^$C'DL=?A<8,QW7X<'1X?PR"RY/T&$=?;K\/SPZ@F=)PQ_@C.
+M4=4#H*AC=B2;FK/YS-J2,R"Z4+G7>>'Q^?'D,?';E;V-$(S7,V-\,[21G"\4
+M-9WN,UGF\X_WBOT$W-O@]WK5T,##S@N9\Y+YO-FS!#Y.3C\(3,^6WF'R9GNP
+MK</&0Y#UGZ74!9(%+*F@),.0&!J(&Q\:&WA\$`T+/;P'R`?8".7K>"+P&#($
+MP<CLZ^')'L/V++%-X?!A&&$H\17-723.&?_!,42UB>6NM@W.3J&=P8,_F?3/
+M5@H=,D'<#3PW2E^1NG)JZW34W_P(`!PYMQ`6W80%Q,#/S\'04$1*RL`<?H,!
+MA(>'`(<S]"[<P;K```D)"&WB9L-74,'5PD`'^"`M/RL`P&O`%]@!U`(#P$*Y
+MFN+7Y7/`%AX`.5\U+4`I6`8`/N>0/.N:30&V;RGDHU2(DZ:YW0`S;JGU0)O_
+M8M9)S5#D5>!O'<R[JMH)>.B82#@H6(CY:AK+O*W?$&'SA54,I.TV?\F47RKV
+MPQ!=J_I)&.BY!:ENM@%,F^L\D><^F?52LA5XWD>2Q')(O[;.!7STC$0\-$R$
+M_78.QZ"YTPQE_YE3+2=!F_9Q"5\:AO-@3;NJ&0CXZ56I+O;`C%HK_=&6G?UD
+MU$O+4N)Z&<%Q*.BP@%@X(!`.?\`\(+".\`GC(P-C'.'@L,,G#F`:)3O?`(G?
+M\!RM#WHK&]0&>OL`./6@*,J]ER^=:>8^*T/;.,0C.+@B(T.U,%0&``8W@">;
+MZT`+F^VF"D^*("("\"A?B8H@9Q\$'R`4`!,/#MQ?C0["`19S%J5H(!`2@25(
+M)#P^L[(>'7DP$@%T0"0$SCU<>%0X_*J],@+@()`&BJ(#<=-,6]#BT8]_3(WW
+M\*QQPY/_X$G2`40`ZDRX@(P)J%^H;L,2E^^4NAX]%B6R-/(D)OAIR_WKA(+*
+M7ZE<<.583``02*TD:>Z-QD]P6!@15%2^[0P=$4#>]/(@``*0$NC@"-#R\*.H
+M/T73M*J77J,,%)G_+@'J`P80#`X/@`(_B.`'#1CI\`4*"XG`H5?&=_A/@!JS
+M!PAJ06-SV8"U3764XMWS^/07:`R4(#!`7=AF6("0H+V@8&[D#MX@>SL<]O+O
+M$%-WV"NTY!87&!D:&QP='A\A(B,D)28G*"DJ*RPM+B_>H89LF)EN0S--H:FQ
+MN<&Y`<G1V>'I\?H""A(:(BHR.D)*5M>Q+3$U.WB)Z@HJ2FJ*BJK*ZPLK2VN+
+MJ\OL#"Q,;(RLS.T;Q&D#:MXC6V-OO]M7.7O]M9N_W_;6\`00/O]M=0N1<-#Q
+M$3PNQ1!%UAD:01T?(2)!)2<I*RTO,3,U-SD[/$$_0T1X04>;])2TU/GU45)!
+M55=96UU?86-E9VA!:VUO<7)!=7=Y>WV\2_Y>%.@S4!-,&?&81!AXD^LQ;`4,
+M,;'GN9&3E9>9FD&=5GS_U4B3@_FIJ.JIH&J)J/H!JO^)JO>L?O['OOQ^KIJN
+MT?J]'NO&B_O&`__1]-,>-'[(E(_1'_]1*CC^''K2ZQ%'[K2ZT1^[0<.V->]7
+MU>U5U:3]H'[JSX>DO_ZJ^'NNBE=5\!=5]BCZW8],EDK<UM,".']E%+QT+NBZ
+MKN`ZNNB[>NJ[/Q)(>C]#K#Z7=5WM/W4=U_0/U_4?[*+Q_?_WM,!I=94M'[#]
+M*7H_8?EV<H.O'"I^KL%#=U%_=U5?^M57^O55^`F;R_J+JOQ.*_C'7Z^\UW;3
+M`UO]:U]]:@3:8*_&GU@GU4_G50&%]%A95=WH3.0U/\?KO'[OQ^[OI&WX#ZZZ
+M+K;'HU;O_[*GMAGE.M'[#]K.LCB?4U^H#7VP$%V!!=%X!+(!L/T=5=0-CZ'H
+M_5Z:_UI@2)1K2_J_*SAPRZJNKXI7]7`?"D+_V6JJKUE^_JO^KJNTMJ/+E;0.
+MEK4>7K5>7G5==X_<JG7TP%K'5;Q(9>U3&Q0':U4U?QI:]&9&,NV7U[1:/@/K
+M;HNM;JNM46]-5HKR[`JU^O^7ZP*`&N7`#,$_:;5RP`8P\M`',Y^U%R`"FNJH
+MHA2I0!D!?YFM/W41D=65H>4YKN`0X8PYAQ<V1^6RSPZ0UBU(!&YWU(#&YW6/
+MFC<[F#?'148$WU6"5Y6`)E@"<6&LHCJP3"P.XBUB3?+5`H>V[N^#*EWK>H@7
+M-33[1"K7#>J>Q7.%M"-(=#N8^BR#0O1(5C1'08Z$W^.B`V-J.3FCDZ<ZL#JQ
+M/W`5P:R$`A&IH(W/>I&^-NY%<U6<2PI@3-,`=S?V@A;?Z(`H?Y5!/#MTB727
+M=*4KI9ND"Z1[I.NEPI%DJUXA';6\#E&Q^M"9U!\]^$@+@],XP]((/`.DU*:,
+MG_T`JI@(7_J18_,C%*H,G^\`W)_:/S\H_/N/Q@EG+X'GXI[\?C/GY%>EK'95
+ML&_(H@Z]``\XTVIFSJ?.5E"'E6A0?DC$@%9/&?A29\%+4@I88R:GAC1`/<-<
+M84TG#EL>;6P`6`+34TX+7*`7I"'SI`'WJJ&+I@`Y7K3'TT1].,?3@!]'`#Z'
+M'0F&`20"(>$X$LT4E'#V!,*`_>->&,P,#Q4O\:&Z5G*U/T@4B<I`8"!!["%2
+M(QHJ!U/]&BB?9ES&B6@XE^R#0UH*V$PL$`#H,Z!*YV$[U5'(U0`]$M\(,0JD
+M%&C7CK[P)X\<".-#2605(]"_3#*`!4ZI`!4^@M/H&Z-#QJ0=/C963]VD]?>I
+M4@,G/4@$+\*8%_L"155.6@&PCY](A&D*`IY)YJ2$_<!AL[D@@1[@L1K4^LF`
+MT.AMC`!6!.A&<@@EX3HPL<*V"=*&B=,'"=.-``'V1.I&A.J&Q.K'!.L'0'7X
+M!AEKIW"X?A$9].D^?I5GZ6Y^F,]L\_36>W.?I[/;_/U"S]1<_4C)IS-;22E]
+M.40!%)#*>$_?A,#V@J3='2B@&HU+'*$/>B[VF?83V^G\PO-^B"U)PSH4`%$,
+MQAD&@S&!1!#,Z^\.,1HQH)]#<:+:&05)_E('&@&ZCLS/IO,E2K_*\`V>=8!?
+M*0U4Q<$B3JDL@2:]!6IZ_8RZIC-CDQV,JB!\A28(B3C!<D\&LPS=5$'5,NBU
+M=#;%-4Y#&ZG*Y1Y!0!W<)B5:G[&C(EDK>PD[A02B[AI/!/W)'-`,J1:-3QJ1
+MF-8\NIF9-K;:JFM)00&X'+&I(?0=T:F%:$:FAK:):HJ!:CB4V4](`!N:"B$&
+M8JR9T`T&2PJ?].JI&$73]2EC1UA;5F"`GV^IU$5SN$LZ6O_&\P;+T@S2T\`]
+MKZIP`YTG8S/%B))6T")M!29`]3'Z`,I"I4'(W4'&4`K5*";:>CG'TR*R?#AP
+M)J#.F@XREEIC`#_T1JO&KTB4_S(QGQB0EZ2"LR)9]ZI6'&OADV36XXMG0`KF
+M/7G4YQ(MQ#O'#QW,2B?K*W7!SC4]R#@"2"`#RP`3W`!VI!FF``88*D.0N<$S
+MZ3Z(J3=@\..D!:"TY!=H1K4KU)A@)K\8!6[*;;!MX%6X+>);X(6^"0I<](8M
+MXUN:W1;B(LY`0FF*)_BHJ+,Y!(P,I_C0T-C8XG^.CH\/#Z?ZT(4_R)/\C(R1
+MGRJ?Y,G^N_$Z[2\4%*/.FO]>?BM>OBQ?P?BUTUW#+<,!)]^*6K!JH8H4G%9N
+M"2#$6Q"K1A[1V1R8;4:$9$8$6Q=/\6101,1'-O-$!#4P/7.:&`8A4MZ$(@A!
+MM<X`J""!B!`"L7SVF`F$EFEJ_BWPL`D3_@(*3^3]!TA%W+'RUY)_,<ZZ!@!;
+M1@')_)^@8<!M<#'RUY)_,<ZQ][`7]\OC[5]?#PI_<37\\G_R.W]QI/V9\2C^
+MZ$_N1\[^ZG;^XWS'+W?SYFQR/A?W0^I_<SM_/(J,'C\]H/-_,GH]W^V94#SE
+MROM[5^#70Y#07;^YB7B_NAR_GQF/BYWRO[@?&1F/F?W2U^ATT'M'V=I9/D.C
+MYB0^0..@`\IW/";3J93XO\OR4?F4PBXL,=9$W&01F0-R\&ND\/"0\*"`L(#`
+MD-"0X*8]0H)"VZXTEWD&A@<&AX:$-<['\%AP8GFQ_,9:935NFJ$.&,Q_,;82
+M@',?S7.U_A+7$'M?P6U_)C-?R6*[XPQP_@MC^"F)OX)</IKR=.%C_2+8EG+^
+M28)\TUI_-?[5.PGHWL?S&($AC-LMCD"G+D8@@YP=&,9I0@@EAK9KF?GM)!'Y
+MR6PP*8T0AH_86G+&8RTRF3\8WG>I-KP:S@R+BXI;U++>"/CT@DQL9&,>,4UW
+MQ,?$QT1&Q$9'Q<?%1S'K'1\9>X<";\9%1<2Q]#1$4QSQ4;$QC7G$Q29Z[X&K
+ME-=C^:R&$,9+MC^B&!%?\<[<CIQ,7KA_+UZG`;NQ_1@$8$"A&L?S2;&NG@4G
+M0Z9[IH.F$X6SA6-4C4XU]L8S@J13IQ).7)'N!8QC.`X@M&L!R;L0X`$4R>R?
+M\:R=2:L0R;S6,T]M6["Z4R747%N'`U<M=[\^F#PW/N-UB?[./ZMXF/T'`6OU
+M=RY>S76UR,;K6L\]P+G9^XVF)_@;'ZGXN;%@6OU=RY>SAUN'"XV*/'9"_GD`
+MOWE)D`.MY-[30!'"&A1!S.>P<=(Q8!2?80J(IVZ([6C*RAAN&AV1AS$>0?*8
+MK!J47^<VL+'S79@/+9.&Y!'9#73QY(/5A4LD:`ZT(T#S]!<];0?/T)S]"^/O
+M'L/T(,`^RE^$D2#"_'E@,<WSH!G-]4OUDD&@"-[8]``6/RK,JLM.OB@`1AQU
+M\M0P52`&F&1&/8,.[_%MWZ"@X6'BXZ2EYN@I*>I0JJFGHYZ7CH1Z;F-734,Z
+M-"\L*RTQ-CU%3EA1A:G)Y?\U8:%@D#GHW-#$O+2NK*ZRN,+.VNCY!Q,?*3$U
+M.3DU,2LE'14-!P#\^8I_OS_`P<-#]QV9WO[V[N9/?0V5E9FIO=7Z(DY^KM]0
+M>=L<2'-T]73S<&QGXDDOBLGY.(?G9OZW6=J;7)V>\E8;N'R'AH2!?WY]XLH"
+M"AHEJ-D97J9FYB4CXJ#?'5N29V%=6>'I:76+'IS>7Z"X8D,E7IS:V)83T8]-
+MC$M*RPN,SI"3%=B;GF%$CI:>H_`@I*"<EY..BXB%A.@.!S_X4GEU<6UH9&!<
+M6EE:7%]E;'1^B92@J[;4!D49^IK[%&MIYN/+^M:RCF9&*A,U7U^@83!*5FZ"
+MEJ:NMK*JGHIV6CXB!>G1O:VEG9VAK;G%U>7V`Y-F'X1-H+BF>LT_WOCOB(S$
+M69G)Z>G9J5CH9\<65934$V+"0>YS3QTC*C20^2E=C;WJ$C93K=G)J6DHV),R
+M5?7OPTQ'_)#L3*4"_/1G7N])G!)_AX['JAIJJLQZJ::AG):0BH7-Y]XE8F0F
+M*&JM+W4+6F:6ML:^KH96#+XK:*6BX!V976;T(&-G;'!T>'OD8,_X+?X`!``.
+M';X`$?']P@[!U41$0?9IQ\<&"#,*$#,%/M/C3\&$GW!!6$%8(?A!^"U\!"<@
+M0QIX4Y_#'C!`^$#N'0%X07_@RQ3@N5<8\$N_&&*FFZY>[MZI^DG^`FBMC>&[
+MD`D5,!G4?/I#`<RQ+8@;Y-5OMJ[`LW:=4[_<MX@C9_/V@(.=@4"G48@G4G/U
+M,X^B:GG81YX4I@[X43::L(]JV"5;'QD@7@#Y(#H`B91X=T8_=&/VGV,G.RB`
+M0!I=J%EU%GQ#)%ZAY\/I/45N>FDU.[0%+W3)8U5:0RO=:/IN(=Z*<P!DS((D
+MSV(I/B2USF)8?^@')$-2#A7#JS&DAC;_.N`D`LI'UIUY)]?R-UW5O'<@<=Q_
+M.Z5R!X.X*J+G9)9$50-<[04?:.%_>$)@.5BC0J&)I\,3^@$D%V11HOJ,?3%)
+MND%*G5P9.!H!3'PE4J,)J*FFCVP(HFM`FH^#^G?HENI^TSK\.T.Z[1A]SO\N
+MS/),U@8BZ)_9W<6M#Q:YR8T<G'D".$L[KF((5*G0@FN7Z[.O773FZS$TZTDU
+M9@-,@.E2E4E3(!J7J0V@K*O<D-ST54<^@$][=$',7.N[SZ!L^KP0FA4N]N,^
+MTR]N2:F2)U5S"UM<3\R!:U)T9):5,G9H5@*0^9`\\UO.O&/2.D4I+G5L1`5!
+MQ@90<9,T28=$];W7(/`%WD^Q`#;.K7)&-_(:Y,?#AA58;!D1G&0^4'BKDD5)
+M6?\;IF(F#8X_4YF^36O4C--!K[*?YB%O6(\0\`!VT+U&=6!<WV?:#>X+5SJP
+M5/;\]4+C@`TQ,(Z7/<$UX$VI-ZAR*!:D#Y._$.\S[N70F&'WPU?4Z?,[J2]!
+M*=.$;.I1H?ZZ`U>H4'':GHP0WJ%?2`,$V2#6!>A%H%#'./+:[=<RFEPY@J0%
+M4'O`,8[:D&=(3\T`)@.QV)-DH6-!DV&-1DWV')!D.3IO54Q,<=(%6/U$5'8`
+M^"Q'9*.UHAL2"7D#"I=5Z\$T;KA([UYUSZURD==2^0J)4X8HM2/QHQK4XQYX
+M@H`G36BB2`/0`EBB)A`21&>E@':@`;0]Z;F*O!,)H[:M7^7[K!(!AQZ[YW/5
+M^-%NH@0UH`:9,DE*@%(>%=`3VNLEC64DC54DO=5D>[-'NK1[]L>[9'UW4H(2
+M2;1M(93`52.$GDA]@9U%RG@R1D1<F"=.`0,%S%V((),<"MC)M7MUSN3-.K&I
+M6I\#"FD-8,6`?F;")B-@/`X<KL"L!2I4ONW4(G;44P*6+6ZUR@B<MZ$$S'3>
+MN)YNX/N9C)\2_-DDY&UC3D95C*,$.U8(K"JP5&</SB1Q:\S7!*43P^_+_U?X
+M$P<+3Y19GN2C4X>[]#!PH:?:/,]A$:G#74_`+S\UR:=J/3BG?FRL+1[RCS#:
+MD^7S7R3J/3ACOTL/#46^?8*JL\.6.5I\EAX9D2@C0P+#2B<CW)"@L,#0WL/6
+M.S_GRS/7:R>-SOP).>>;@#W#Q-H97D6"%/$Y"HLB+C)I09G@/V2DY46%YD:G
+M!V?H:/*10P!-HPHH^D<"_\PYQR`G)2PF7B``)VPG!R5/)I2HY-*D(&4E]R4+
+M)ALF,"9`)FPF#Y0*#/K\@4UO(']]_'O[_%P?E6Y-UN3]P)4`!_B>@#'Z:`%Q
+M?H.@H,TG>,X`IF-)"7?(P0I[BT(,-?F*FMJ+?[-0;K4`NOX`!0=7X35Q-$;G
+MF.!_`T$!M\MWS7`Q"#7%P)^%Q4<N9S1]&<#((!IK`0!3A`M>O"Q/AG5%V%PC
+MBGK@Q\>$`QX;(`F^(`8BQ\"2?`*0IP$"P$"!5PV*##@@C$(B0K<ODI?GZ[7R
+MGF5O51R%SE\F#7RSVB'^R1R/(*#`KA^,YL;A^(4,>@,GPG(G4G@Q^)X*0@9P
+M^08$!/_`B`#%D'%HH@(_Q%/]LS/".*(Y2PSKSW$=T1PQ'*2([!"U1Q_Y"*JV
+M5RK%$$%QZ[J]_>XS;!KGW*"5@`A*%``6,#(`'!]L#)H!A1=;`K':(@!L!%"=
+MHHBZ/L=>A:,(D$I,-VP2`#ANN#@T3>'EP+!K&6V;0BO`!!U%NB?8>J_G:'SU
+M?Z>K#_9>TXP/#\4_OZ8'F1N<;J0N3/R;895\3`.,#_C\//%S`H1)Z<$^(!!A
+MAAF#S-*&&&"4!/6(>>)X1@F(T#/#CYC'M.^PZZ.^9.>N<0RS`\30#,B&_)O=
+MX<%I5&'@?QL6$1X1'V$0>></\LZ\*+6#`>&)R@/\$H"/:<<\24>TXH\H]H=!
+ME^J\I0T=EI(J3X(=[1Z@7B@G56@/@]EA?U'Y0/&$X/\"H0F9]@(6B/9?1_^4
+M5I_5TX<<=S4O_,JLS@,(RK#+ML9_F[C]0Y.VH<P)NMD^'\\JWRTURQ@'M3&Y
+M[3PX+3PM.C#JT*9N\&TH^%^GEPC]#`/GD>9\\K<CORT9:/#_LN6\,M$D_M!$
+M\J,JB7\GEM"D&3<PR=(`XI*4R;,MDAF87=`V5HDN*)E*0G"0D5B9.IK^.$XH
+M(`>KNK#RTAIYS,B'J_Z/A\J_YG@_@]AU,6+_6BE<+5?%C7("`8!GD57_CU$$
+M\;0HW>"<Z\%'Q\'T?_],CY#[#H!Q^2X0G_/L=#52#L?&.N^9MU5/@XG^P8'W
+MPMX6LA>.C&1?P;%NV8J=7P<.[OH+JN,T,$F#!LDE<PU8-#1F$P!CYD>V'C'$
+M;,>4:6`'S=W^N+^/&CPE8L)[S/?KH)7^=^<(?G_PG5&=G3SSGO<]^$X.#FK_
+M.+,Z*@E&V_/"C"\P.S/?_D@=S&(?''^X`:'W^!P["9#L"W^N>)H6\,`%/P&A
+M@<5>6`R.0:GT+M`O0.__V?Q]SWUGGXXCG_9ABYF9B8=LSC5-C;S-N^9CMC9C
+M1.)DD!JI`7GY_0;<%M*GX7*UXO#.BQU)MP,"!N0DE0V'JLZ<CQ8B(YM.'![K
+MPAMRZ$R7+X9`?5\RF!L15TEM_O*_HD)9:^.<SG)P##*$XY<'ACN7P@BO)!`_
+M:ZE<')R=01A!X!^71`>J`-+.^DIFU!SR8<>>4YW'X_YNGQ^6<@%3E6"_Q/0#
+M!P\?I/\?M`'^LW-G0>&3>0_I+E`8<'#EA]J;&>7,/F)^P'W6-#`^;G3)CHPZ
+M0&D[&_(#X<"08,3%@O@C%,'`@"F#P9+(`::V.(8342CC:S$^T`.-=^,^=BQ:
+M_Y\`/)BB,C(>'1A#P;,S%O<8*H&-FJ__,P\7.K8^0#QB[&^0,;\8D>WJBX&?
+MRIQ^`R49Z\!X/:U4?EG50P,$^2;E4KGL[Z1*S-L0`L968S[.5FF#N2`O9+[*
+M6ZW#\QD>7`Z89Q]<0*`D,#-#0PO(8,$93A<-5AZN&JQL#E[J^K[[`'_X`=52
+M!TJ5E2=(`4\@&T\A\\\G4\\\EYVK_:I.X6I.0!D.0`5`=,PZ_Z_2-.7(!JJI
+MN`+XL/.W1555/2JM(;W29?O7_J])'<5^G5T&7[JB0-5TP%K@)O2'4]6`X8=(
+MY=#U==2Y=&J*J50+U(H#5H6+D_?VYL"C6HT`&,"]:`N[]O4@)K^W(,*ME&1J
+M^BUAL2I+!`5+;KN+6C`5;Z#`N;]U9E\(`'\`QU*IJ_35IO0`/V/Z](5LFFNL
+M]+T1`#__0HL++]#6E0JB+S/T/Z[DUE50M,"8)?!TFET_0VE0U@I'POW\3]Y*
+M4_4:W4[CIK`>!``_1CDH9R#7Z=*0&OU*)7J8_:1P_:EI1=.ARW'"T1"=U`1!
+M3;D'3]VD:TV7SOT]ZBNSK?!$#\`#UB?ZJE`8=4\;F?5#QY1&AZ+;G28!F0'4
+M@&'[(&'[2&@":D>/*!L:AZ*.L/T#L/VK`%9P)5?JQ`7=^WS*`7.BL=:!>VP"
+M!>M`>ED=--![UT_TNG^H@*GQ.0TZY$.D\@//TYR`!?N@-#KB5ZJ_"5J<E6GC
+MLU^?)5]N%!^3"!K5:57]50/O/_]6,#^5(#]K`Y:@Z0\3Y[,R>[C)`ZI`X_>H
+M`4_C(`TWCRU7NHW*N(?9>Z5``U>%0&L!J;4AN$W//%I-I^N7HY=#EN.50F`*
+MO2@#ZEZ^Z8'^\LBNJK5I``.XA%S)B1U>AU3Y>N?J!]]=>I6LX;@#]#]+*&-7
+M]4NU=[R=;?:X@+WL"^+$"D?[4U8[;Z:;-)HESDK%[975'Z(C)+WS`GS`EA:<
+ME=9H=+OU^G7A6$C6?H-2JH#2;&3Q<P$WT`6_P;7Y;_$QJYL_F(`>AZT`Y%#D
+M.OV3I./T-4U3?R7A@:7]!S8QC]_IZ]9'KK2TJUW7SR=T]*O!K!JIVX;@/2Q,
+MR5?H6)$Q^\J@<K1XG7+?'[8CH`LTI``YZOT_YK#BCK-669R3-2XS`_?]`KA8
+MYD.1)S)U)U)TAJO)4@7><8J6+#J?$<:?7[A+4G4DG,G2',]Q``:K\=8`8`&!
+MSE?W>!\-WLM[R[Z4K7I>5X/UJ(`NA)\4I^1P_=!?Q#GA`.]4CO4U^M)[28PE
+M:EPI6M]VTS5X<H#P>+;3]>K;=OUW/1Y*56I^2P&&GK'[+2)'J]=]ZR$@"K^<
+M@?CKH'ZJI2%XN+7JWO8%\2CYS8?H;>Q@!__=^(K/:^WB`_;RX`*NFW9*>BC]
+MI8.W8"]#IN7!`IX*:XYB56NO>]#]_ZY!K4<;B@#VXFJU!ZSR-6B6]_(`NW!`
+MN"#-8_8ZKQ);]SJ2_Z.==#'Z:O+K`)K)O.,,D@G4ZJ<G2FG[5,?L@U].W6?I
+MUN6RQH(IUO.@`<]<?M,U^AT\U^B?2KJC]M(DU6HC#J\%`(JKB:H/52KB_ZYP
+M[F<3^ZR]NLP#_0J&[,YY<JAYBV=/^?NNKP*Y;ZQ]OO;`Q`_$@#56S"#NOWXE
+MCW(/[X+E3]K,`/98_"Z_N@/'%Y"K2`=3DHK^7H8A^GIS`-L,0UUF@'T4D?>Y
+M7ZK`4LN,(9J^+$!Z.[&2#G7TY=>H-6``#5K%->/JL$\T[6\+92HOI``GVX"7
+MWY^GE#_K#I)PZL'NCWL<6V'>[`_7O(!WC`:T7<[GU3<`1X@39`08X25.'U5&
+MYGRR-_P/\`,?]_Q$CM>,!#@X?`.'OX3#'X.!P.`KR@.!5A/[0#!!@@0YCY*#
+MF8!Q54I,`P&!5HW53X.,<%GR:2X='E*`LI#`,+#B`HQI@_:]-G&((X`;9O0)
+M_KR!@SO5*@(-R5CSG@^!S`_X`WZOE#?T!_P'^K8CH>G>E/FKP__L_]L__O-%
+M!YRP2X'#;B\'GARA^ZSNV\V_DO+OMUD2)$A04)2G63K(##1$CA(B1@?>/.N$
+MA0%!:PG%X7%Q6$V28H68H&0K3`0>`@8*4)`0*"PJ;Z4ZQRB$DB4"(N+"P<%^
+M/\`*?($=(T:RK(.`0\.1WXO&Y>6E<\*.<3_3A""$`V;:CG_+&?R$?PA`]"'H
+M"_\'S<>,B8$O88"!-W/8_9RROJ/[N_GS\W?Z;C.3(W!QL>-FWXKD=\S=^?@<
+MMPX,'`,P!>"1.45/VY)`0K=/94T&=R9ZF];^$;K7`,!GNHVL"R8BP8"P;(1,
+M6K)R<.#G7C1!0D2%&LDR0DQ6EI:JF)BQ,C`8#S,R4#P/3"X,*,2<L9H&98."
+MJ\"ST>2T^(335['^\,W-:A5!0D+`K*DI24E4B4I<=M4$[]8&$_VYO';7HR(_
+MG"=^$0><37@__)G/>H%7_#OO*\X0]XPQ[9&W;&'7'''I$%`@8(?^`@2E%35"
+MZTI,U\`,%G_N=@&`4I.A7P?:=%8A&)6S<,`PD*Y/BGV;'63$PR#S"#_"2'^M
+MP]C-%A@"E\29)DR98]<II`3)$2)4;,NF#`QU33>";B`UD!*P:!P0^K$0^$@Z
+M-07%:O\E*0*P;0<%B/\#_/]5@/QH<8"'"0I8A3`,D"5)B9(B)$B),-WY@[Z)
+M??[YO'$(!`/7P_@8,'C9]HN)ZFG'_\+!\[/"'%?>$Z^,">2`@32EV0&L\>8X
+MP0ZF,5#/$^48)FG96GRD^,$COM:V#`6G#)Y,0X($!"/`X%.OZF'R91B_C_P)
+MDMHC1JDB1$B3B+AJM4,K`P8`X'HR?]_.`R)("'.?X`<F3D1*3(@3(L12^R<6
+M&>'_)!I(JJ9ICA!C@CQ`@Q4A1@@IC$R/&##A.6`7_CSY4AGP@'L]BA/W:F?O
+MG#Q#.`?,7J_!2&>)L+A3'##G@UNAR=Z9\NC!"Q?0`KTX(=5Q3.G,&C_@'_JN
+MS$"?$"C_IQ;]V`/VMK<=G_+`?]P$.$&/=B/I#=+_!=)])%2KA@+__?V0@$)_
+M8>%FG!@LH%TN3^!"#]#"#\3U^KVO(3#`\7(`D$!0$3(!`!$A1/4)6@$,K3Y_
+ML4M;@0!Z?H@*EID9G)<0"NVJ;2:H:'D#'IP:8FWG2D`0`AR60:`IV)W!R<("
+M&B`*))Z@R>@8UT%`*,`8*("C&6<PT5$PD#(!=+\#7"&L-K-:F0I-`/!6,8%#
+M"EJB='UP71S:NT\4QP1JG,C%'5IT83,2P#0),U(0Y.#E$9'D-3T$DNTK1)-4
+M4T#6,MJ*1M6XIV!P*A4,H5IJBB;0R'BK6'R'KF!<%6Y%CE+B8&D,MR:=/T&1
+M,`5N0RR=0B*A*.J9<B+8*@F$@SACI8BR+Y.D")@K2PP;(2L.;2RTW)OB3'5>
+MF8QVHZ`X"XG(4(4?J:J5S"N*)5*E+%`$H*@HHY1+'ZU=IVB3&:!!XC'@R`F(
+M`+09!_Q.C["!-/<2-$RUR8F1?_`!A[)T?2F`%3\$G![>`Y#$K"=DM@].=H;3
+M/!00)&D2?P#2[0I_0:M=C!4G2@<8I3K]A'A"YPY`<23V09$VNAR5F+!BU2"@
+M]$0';+E[M!R;T%Q@4>:U$K'3#<2'=L`$:E"&SC0X373*#4<\-S;\@_D5F.+\
+M3]^Q<I<@20.I$(P,DZE*",G,`LFR"A`&6I]LNP4&)@:$`@$C""26XNRKI2VY
+MP:,B:>#$.!"KBTX<@^`AH*XE8@P)L@*'I9"@:Q8KF$LV3<;%)!073@`6WE94
+M$/7#F539(0%!@]9(`N>]\T-QD7`7D;6B=S%-U4W[6GY:!_P&Z!"#2$;K'"A&
+MP5(0M-`2PZ,QTI=@K"0D^5V(B+'Z"LOVE:),G\DFFVUQM:CE;D[P/C:"(V@Z
+M*`DN87B?8@0?TJE>'%:MVW\9/I@W%0)E_+4_F4#/VBX&0=%!5"UEGQ</('++
+MH+0=DXQ-N+UH*8GF,0L<"Q-!J6;(%H#93Q*QHA;;@4;&YR2D[@,(*:6F6^<3
+M88OP@/$&5Y$PE+)PQCTV-"UREVS%#@]A\%6L$-`R%K022+:Z1`,9+89<I8-$
+M`E/)(,"`6Y\RFGD7;#X)2G`1=S`,):<0P$#B=L7PP(M9,=H5!1E#M'X!=*(0
+M20&F+MHE<DII2=*Q@6;GD;^QQG.-M[37+:/6*!BC2Y8ZGAF>"D<Q4!$AED>0
+M\29+<"$G3R*$LCDP*60U"Q(/`DF10)<#L59#E(9'\:9.%AH<AV93C9."!DV@
+M\N^PU=@NX2K8F)+A;G"V"OAPH`@8HXQS,2XF+JDJ>S#$XRI-SB,0(<XE2D.V
+M<2#P2DV4N&8V:EP=F?L81>@0Z!0X#!77!7"DN!1KE9;H"9:W'@W'6RTPRR:"
+M5[%6XDPRJ`8PSLF<R/#8:IHX3]<>6DLO7>&DMKB0#0O=<0(;#\Y,S%.IQ@W2
+M:3Z9BY*S,U."!.LI=<*8"8#V6*<Q!:%$;'%6$8@\72B#F1;=U$X<41O_@-F?
+M%X9PX//%^QY":"`<V>`>MP=T";DS"R#\!8(N$,7P>6,D6PE4G<*.LR"<N`0P
+MD#"RE,K#]BW@I@MV@H!#<TX>V^[`8X)H")A;'8V/[9W.L0Y:W)(MABC$&!4U
+M`E-[]"DFQPH%W$H67^8J$R2$I<'-BP%Q)N]5J_0&:366K%/<-<X+!-QI1&67
+MXBT-7CD`3$8I`(!`F4["PX/%Q-T-?L`#`$:&4%V@UU&/W&&)(G]"2IEP_<1;
+M(`)7N16V$[<CJ>X.S8I6ZQ<`+<WCB$S-N9T["(&YUM,K^X.H,XRZBB;,?@.$
+M/5*HJL-8HD\T:L$0.S:Z@BN!A;7HU<-@"VRSF;KP"(!Z0P8S#T6/!=L=!P4N
+MGZ4.>&*3LNO4+I4,V/-5:+PRYSV(LN9$@K,C8HYX)KAX%):Q0"3,4S)TMMPZ
+MPK`8DT\N+7H>IZK.J+')@S&@#I9H*,Q"`SG<`U&MHTG6Y5X,*KK<QEA?9C36
+MP0LO"G`MZ4BGG)MQ>K8H/T36[O7<+&I5JJE%&$D3+4&4U6\<)SN.0Z3:G[FN
+MSF'CG8FQF@S<,X()KF'&O9*XA;L=;>-$L)4[&-.)Q(EDXPEVEFR+$Q!"D)2U
+MK+7@C5L'ZT<:PA=/UL'F\"T&7J?S+90XB3('^I\%"4J328[+-LMP)N9NB8J(
+M"0"GTEQ0W9D'-UPLNH'@C%:4EC%&.(,PK%H$,I"M8(R$B!"0N9*N=&#GA=B3
+M-*;%K&)S!SCC&H'H;%C-+Y8L$M4#>%\+@+CAS`@JS+<'M3B2+&PCNDNR:QDQ
+M(,L^2ZX`S%&0MT6))G-L&QM-20"Q0@Y:S=$YN8TSPC,TXN!,V@)Z-,2'I,@X
+MOR)A@",I[EC3>#DDVE^UQ!C!D@V!A;7`RLS>-$#%"<C.?RYT*!.6[7FDS#;R
+M:68\TA&Y;TPVN)N)E*"H3$W"-1/"4('4N0<5&4L7,6@$"2R#&:*GV1F9&Y+Q
+MXY)V17`1)Q9D2<Z4V$#]8%$(TC2%PJO1@M9AI:K77),`FZ\%5N?<@9SIV8KF
+MFU*A;":::C>=ZF,'A("R&JV.,UXKK:A!HFXQ%)@I\PE&-)ULV:>F$8'.?[\6
+M1\829`3\241299%BS9S2TU@A;QK8=\G<@;0C7_6:UF\ZE=C=7J=86WD+)].;
+M/,I\6/#FS_N,[YI0:VK[:YK88S@0ALY6M]WC(;F6%1(.=YUWAT2&].4L:;A`
+MER66\4QK;;)9SP6D_&JYUPUWA*9O+2$!0YI-9/EPMDMW0*`A5K.=\;-H5<<F
+M%MK61*35;71RMFW9;@62%':[:T'?$O%0YHO35#G?&.<X?W`&<96!VI=NCQ"@
+M.<6=S',`L5Z'#CA["3H3IU-3#N1J3>`&Q/N-=36T[Y$KCA2H\-Y_@(Y+XZ[D
+MXM0;..HZT;B@RW@!-=3A5MG*XVP`C7$US-R<-7C-TB;;G/(NNQW>\%<=+YK\
+MUZ$PR]1.)A#M%(M/470D#A[K0P%MSKL13C<M;8\MY0[E?([C+A&K8;<UWST&
+M9I`D-9R+F;U46FYB&L6A[</5.H(;!);D^3AN.HQK878VIY=8X@"$MNF\VHPQ
+MB8",N&RSD!,Q10R;H$W5^W0FYU9<Q`:UN;</@:ZX+GZ%(')S@J>#KE5*@S,"
+MKJ\K":B$S)$'M4Y`A"Y/1QDI!0,5#@05;*EUT!F\&KURVG6D9YC9C==<EEM,
+MW)[=<#KQW*YJ(`P4.9.%QH@9J,*FQ<*D:V\:AG:*0.O1J>'>C`6TJ93Y/EFK
+MIP$;RXV\Q)FN1(QE^@1CI<X9X<G$`\X-S4U,P`49CP!@DG2Q=8,&HQD;Q;4L
+M/L1<]+YQMN)K&24#`MU=%O@#\E:/#JFMN,ZR1NN+.\N40UQHM';-?0*Q2A((
+?HT;F)R7&)D7./!%,L0$LN>9&1T:G*_!/]O_^"7-__\N4
+`
+end
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+The Herd Mentality as gleaned from the musings of Bil Herd
+------------------
+This is sort of a story that was really a bunch of posts I did on compuserve
+back in 1993. The language is coarse, the English malformed and if you're
+a certain ex-Commodore employee you might be offended by the story.  In
+that case, the story is probably about someone else and not you; my
+fallback position would be that the whole story is a fabrication and meant
+for entertainment, not to mention that worrying about things like 5 year
+stories written by burnt-out wackos is bad for your cholesterol.  As it
+turns out, this is about a sixth of the sillyness that went on, this is
+more about some of the events that centered on the 8563 chip which I was
+being asked about at the time.  Oh yeah, the content is also dated with
+a slight slant towards the melodramatic.
+-Jan-93  19:28:05
+Coming soon to a terminal near you... the gruesome story of the chip that
+almost ruined CES (and the C128 along with it)
+EXPERIENCE the shame and horror of being a Chip Designer at Commodore during
+the Witch hunts!
+SEE the expressions on Managers faces when they realize that their Bonuses are
+at stake!
+HEAR the woeful lamenting of the programmers as they are beaten for no
+apparent reason!
+SHARE the experience of being a Hardware Engineer... stalking the halls in
+search of programmers to beat (for no apparent reason).
+LEARN how to say "THIS CHIP ONE SICK PUP" in Japanese.
+Find out just how badly busted up the 80 column chip was and how many DIRTY
+fixes were needed to make that all crucial show in Vegas on January 6.
+(Christmas, what Christmas). .
+<Said rather coyly in an attempt to elicit any positive responses> Unless of
+course no one is interested..... :) Bil
+-Jan-93  15:37:59
+This is the first of many parts as this thing went round and round during our
+mad dash to make the CES show. I don't even remember what year it was. The
+was a holdover from the Z8000 based C900 (the "Z" machine as we called
+it). The people who worked on it were called the "Z" people, the place they
+hung out was called the "Z" lounge and well.... you get the idea.
+The most interesting thing that came out of that group besides a disk
+controller that prompted you for what sector and cylinder you'd like to write
+to on every access, was one day they stole the furniture out of the lobby and
+made their own lounge disguising it as a VAX repair depot. We were so
+amused by this that we stopped teasing them for a week.  (But I become
+distracted....)
+Now the very very very early concept of the C128 was based on the D128, a
+based creature (boo... hiss).  The engineers on the project had tacked a
+VIC chip onto the otherwise monochrome (6845 based) in an effort to add some
+color to an otherwise drab machine. No one dreamed that C64 compatibility was
+possible so no one thought along those lines. I was just coming off of
+finishing the PLUS 4 (before they added that AWFUL built in software to it)
+and even though I had done exactly what I was told to do I was not happy with
+the end result and had decided to make the next machine compatible with
+_something_ instead of yet another incompatible CBM machine. (I won't go into
+the "yes Virginia there is Compatibility" memo that I wrote that had the
+lawyers many years later still chuckling, suffice it to say I made some
+fairly brash statements regarding my opinion of product strategy)
+Consequently, I was allowed/forced to put my money where my mouth was and I
+took over the C128 project.
+I looked at the existing schematics once and then started with a new design
+based on C64ness. The manager of the chip group approached me and said they
+had a color version of the 6845 if I was interested in using it it would
+definitely be done in time having been worked on already for a year and a
+half...... And so the story begins..... (to be continued)
+-Jan-93  19:06:28
+Looking back I realize that the source of a lot of the problems with the 8563
+is that it wasn't designed FOR the C128 and that the IC designers did not
+take part in the application of their chip the way the other designers did.
+The VIC and MMU designers took an active interest in how their chip was used
+and how the system design worked in relation to their chip. I overlooked
+ramifications of how the 8563 was spec'ed to work that came back to haunt me
+later. For example, it was explained to me how there was this block transfer
+feature for transferring characters for things like scrolling.  Cool.... we
+need that. Later it would turn out when this feature finally did work
+correctly that it only was good for 256 characters at a time. 256 characters
+at a time. 256 characters at a time?? I never stopped to think to ask if the
+feature was semi-useless because it could only block move 3 and 1/3 lines at
+a time. Did I mention the block move was only good for 256 characters?  Later
+a bug in this feature would almost prove a show stopper with a serious
+problem showing up in Vegas the night of setup before the CES show. But I get
+ahead of myself. It was also my understanding that this part had the same
+operating parameters as the 6845, a VERY common graphics adapter.  Not
+scrutinizing the chip for timing differences the way I normally did any new
+chip was another mistake I made.  The major timings indicated what speed
+class it was in and I didn't check them all. I blame myself as this really is
+the type of mistake an amateur makes.  I wonder if I was in a hurry that day.
+:)
+-Jan-93  19:06:39
+It turns out that a major change had been made to the way the Read/Write line
+was handled. When I asked about this, VERY late in the design cycle, like in
+Production when this problem turned up, I was told "remember,, this was
+designed to work in the Z8000 machine." ???!!!! ????!!!! Shoulda seen the
+look on my face! Even though the Z8000 machine was long dead and we had been
+TRYING for 6 months to use this damn thing in the C128 I'm being told NOW
+that you didn't design it to work the way we've been using it for 6 months?
+Shoulda asked.... it was my fault, shoulda asked "is this meant to work".....
+:/
+Don't get me wrong, the designer was VERY bright, he held patents for some of
+the "cells" in the Motorola 68000. It just that chip had to work in
+conjunction with other chips and that's where some of the problems lay.  Our
+story opens as Rev 0 of the chip.... (what's that..... doesn't work.... OK,)
+Our story opens as Rev 1 of the chip makes its debut and ......(pardon me a
+moment.....) Our story opens as Rev 2 of the chip makes it debut..... <to be
+continued>
+-Jan-93  20:50:41
+Forgive the sporadic nature of these additions.  Now where was I .... oh
+yeah.... It was sometime in September when we got 8563 Silicon (or so memory
+serves) good enough to stick in a system. I can't remember what all was wrong
+with the Chip but one concern we had was it occasionally (no spell checker
+tonight, bear with me) blew up.... big time.... turn over die and then smell
+bad..... But then all of the C128 prototypes did that on a semi regular basis
+as there wasn't really any custom silicon yet, just big circuit boards
+plugged in where custom chips would later go... but you can't wait for a
+system to be completed before starting software development.   I don't think
+any of the Animals really gave it a thought until when the next rev of the
+chip came out and now with less other problems the blowing up 'seemed' more
+pronounced.  Also the prototypes got more solid _almost_ every day.  (I knew
+to go check on the programmer's prototype whenever I heard the sound of cold
+spray coming out of their office.... later it turned out they usually weren't
+spraying the boards just using their "Hardware Engineer" call.  Sometimes all
+I had to do was touch the board in a mystical way and then back out slowly
+sometimes accompanied by ritual like chanting and humming.  This became know
+as the "laying of hands".  This worked every time except one, and that time
+it turned out I had stolen the power supply myself without telling them....
+If anybody else got caught "messing with my guys" they'd get duct taped to a
+locker and then the box kicked out from under them leaving them stuck until
+they could peel themselves down, but that's another story.)  ANYWAY, when this
+problem still existed on Rev 4 (I think it was) we got concerned.  It was at
+this time that the single most scariest statement came out of the IC Design
+section in charge of the '63.  This statement amounted to "you'll always have
+some chance statistically that any read or write cycle will fail due to
+(synchronicity)".
+-Jan-93  21:12:05
+Synchronicity problems occur when two devices run off of two separate clocks,
+the VIC chip hence the rest of the system, runs off of a 14.318Mhz crystal
+and the 8563 runs off of a 16Mhz Oscillator. Now picture walking towards a
+revolving door with your arms full of packages and not looking up before
+launching yourself into the doorway. You may get through unscathed if your
+timing was accidentally just right, or you may fumble through losing some
+packages (synonymous to losing Data) in the process or if things REALLY foul
+up some of the packages may make it through and you're left stranded on the
+other side of the door (synonymous to a completely blown write cycle). What I
+didn't realize that he meant was that since there's always a chance for a bad
+cycle to slip through, he didn't take even the most rudimentary protection
+against bad synchronizing. It's MY FAULT I didn't ask, "what do you mean
+fully by that statement" because I'd of found out early that there was NO
+protection. As it turns out the 8563 instead of failing every 3 years or so
+(VERY livable by Commodore standards) it failed about 3 times a second.  In
+other words if you tried to load the font all in one shot it would blow up
+every time!  The IC designers refused to believe this up until mid December
+(CES in 2-3 weeks!) because "their unit in the lab didn't do it."  Finally I
+said "show us" and they led the whole rabble (pitch forks, torches, ugly
+scene) down to the lab.  It turns out they weren't EVEN TESTING THE CURRENT
+REV of the chip, (TWO revs old), they were testing it from Basic because it
+"blew up" every time they ran it at system speeds (No %^$#%$# sherlock.
+That's what we're trying to tell you) and even then it screwed up once and
+the designer reached for the reset switch saying that something does
+occasionally go wrong.  Being one of the Animals with my reflexes highly
+tuned by Programmer Abusing I was able to snatch his arm in mid-air before he
+got to the reset switch, with blatant evidence there on the test screen.
+-Jan-93  21:12:15
+One of the rabble was their boss and (I have been speaking about two
+designers interchangeably, but then they were interchangeable,) the word
+Finally came down "FIX IT".  Hollow Victory as there was only two weeks till
+we packed for the show, and there were 4 or 5 other major problems (I'll say
+more later) with the chip and NO time to do another pass. It was obvious that
+if we were going to make CES something had to give. As Josey Wales said,
+"Thats when ya gotta get Mean.... I mean downright plumb crazy Loco Mean".
+And we knew we had to.  The programmer thrashing's hit a all time high shortly after.
+-Jan-93  14:17:32
+Memory flash, I just remembered when we found out there was no interrupt
+facility built in to the 8563.  I remember how patient the designer was when
+he sat me down to explain to me that you don't need an interrupt from the
+indicating that an operation is complete because you can check the
+status ANY TIME merely by stopping what you're doing (over and over) and
+looking at the appropriate register, (even if this means banking in I/O) or
+better yet sit in a loop watching the register that indicates when
+an operation is done (what else could be going on in the system besides
+talking to the 8563 ???)  Our running gag became not needing a ringer on the
+phone because you can pick it up ANY TIME and check to see if someone's on
+it, or better yet, sit at your desk all day picking the phone up.   Even in
+the hottest discussions someone would suddenly stop, excuse himself, and pick
+up the nearest phone just to see if there was someone on it.  This utterly
+failed to get the point across but provided hours of amusement.  The owners
+at the local bar wondered what fixation the guys from Commodore had with the
+pay phone.
+Any ways.... To back up to the other problems that plagued the 8563.  Going
+into December a couple of things happened.  The design had been changed to
+support a "back-bias generator".  This thing is generally used to reduce
+power consumption and speed the chip up.  Well, something was not quite right
+somewhere in the design because the chip got worse.  The second thing that
+happened was that both designers took vacation.  Nothing against that from my
+point of view here 8-9 years in the future, but right then we couldn't
+understand what these people were doing working on a critical project.
+-Jan-93  14:17:37
+Or maybe I was just getting to used to eating Thanksgiving Dinner out of
+aluminum foil off of a Lab Bench.  Christmas consisted of stopping at
+someone's house who lived in the area for a couple of hours on the way home
+from work.  Anyways, the chips could no longer display a solid screen.  The
+first couple of characters on each line were either missing or tearing, until
+the thing heated up, then they were just missing.  Also, the yield of chips
+that even worked this good fell to where they only got 3 or 4 working chips
+the last run.  A run is a Half-Lot at MOS and costs between $40,000 and
+$120,000 to run.  Pretty expensive couple of chips.
+The other problem takes a second to explain, but first a story.....  Back
+when TED (the Plus four) had been mutilated decimated and defecated upon,
+management decided to kick the body one last time.  "TED shall Talk" came the
+decree and the best minds in the industry were sought... We actually did have
+two of the most noted consumer speech people at the time, the guys who
+designed the "TI Speak an Spell" worked out of the Commodore Dallas office.
+They did a custom chip to interface a speech chip set to the processor.
+Operating open loop, in other words without feedback from any of the system
+design people (US) they defined the command registers.  There was a register
+that you wrote to request a transfer.  To REALLY request the transfer you
+wrote the same value a second time. We referred to this as the "do it, do it
+now" register or the "come on pretty please" request, or my favorite, "those
+#$%&@ Texans" register. ANYWAYS, the 8563 also had a problem where the 256
+'bite' transfer didn't always take place properly, leaving a character
+behind. This ended up having the effect of characters scrolling upwards
+randomly.
+-Jan-93  14:17:45
+So to recap, going into December we had a chip with .001% yield, the left
+columns didn't work, anytime there was one pixel by itself you couldn't see
+it, the semi useless block transfer didn't work right, the power supply had
+to be adjusted for each chip, and it blew up before you loaded all of the
+fonts unless you took 10 seconds to load the fonts in which case it blew up
+only sometimes. Finger pointing was in High swing, (the systems guys should
+have said they wanted WORKING silicon) with one department pitted against the
+other, which was sad because the other hardworking chip designers had
+performed small miracles in getting their stuff done on time. Managers
+started getting that look rabbits get in the headlights of onrushing Mack
+trucks,  some started drinking, some reading poetry aloud and the worst were
+commonly seen doing both. Our favorite behavior was where they hid in their
+offices.  It was rumored that the potted plant in the lobby was in line for
+one of the key middle management positions. Programmer beatings had hit a new
+high only to fall off to almost nothing overnight as even this no longer
+quelled the growing tension. A sprinkler head busted and rained all over
+computer equipment stored in the hallway. Engineering gathered as a whole and
+watched on as a $100,000 worth of equipment became waterlogged, their
+expressions much like the bystanders at a grisly accident who can't tear
+their attention away from the ensuing carnage.  I can honestly say that it
+didn't seriously occur to me that we wouldn't be ready for CES, for if it
+had, I might have succumbed to the temptation to go hide in my office
+(checking the telephone).  There were just too many problems to stop and
+think what if.  Next time (hopefully) I'll try and bring all the problems and
+answers together and explain why I stopped to tell that rather out of place
+TED story.
+-Jan-93  19:27:11
+No single custom chip was working completely as we went into December with
+the possible exception of the 8510 CPU.  The MMU had a problem where data was
+"bleeding through" from the upper 64K bank into the lower.  This was in part
+due to a mixup in the different revision of "layers" that are used to make
+chips. This chip essentially had one of the older layers magically appear
+bring old problems with it.  Unfortunately, this older layer had been used to
+fix newer problems so we didn't have a way to combine existing layers to fix
+ALL problems. Dave D'Orio (start telling ya some of the names of a few of the
+unsung types here) did a GREAT job of bringing most of the IC design efforts
+together.  I was sitting with Dave in a bar, we were of course discussing
+work, when he suddenly figured out what the problem was.  He had looked at
+the bad MMU chip under a microscope that day.  Later that night, under the
+influence of a few Michelobs, his brain "developed" the picture his eyes had
+taken earlier and he realized that an earlier layer had gotten into the
+design.
+-Jan-93  19:49:06
+This would not be the first time a problem would be addressed at this
+particular bar.  (The Courtyard.... If you ever saw the David Letterman where
+the guy stops the fan with his tongue, he was a bartender there). The PLA had
+a problem where my group had made a typo in specifying the hundred some terms
+that comprised the different operating parameters. Well the designer in
+charge of the PLA took this rev as an opportunity to sneak a change into the
+chip without really going public with the fact he was making a change.  When
+the change went through it caused one of the layers to shift towards one side
+and effectively shorted the input pins together. Ya should've seen the seen
+where the designer's boss was loudly proclaiming that Hardware must of
+screwed up because his engineer DIDN't make any changes (that would've been
+like admitting that something had been "broken"). You could tell by the way
+the designer's face was slowly turning red that he hadn't yet found a way of
+telling his boss that he had made a change. Talk about giving someone enough
+rope to hang themselves, we just kept paying it out yard by yard.
+-Jan-93  19:53:45
+Anyways back to the 8563.  The first problem was relatively easy to fix,
+providing you didn't give a hang about your own self respect.  The 8563
+designer mentioned that the block copy seemed to work better when you wrote
+the same command twice in a row.  I made him explain this to me in public,
+mostly due to the mean streak I was starting to develop when it came to this
+particular subject.  He calmly explained that you merely wrote to this
+register and then wrote to it again.  I asked "you mean do it and do it now?"
+"Exactly", the designer exclaimed figuring he was on the home stretch  to
+understanding (Intel, at last his eyes unfurled), "kinda like a 'come on
+pretty please register' I asked with my best innocent expression, "Well sort
+of" he replied doubt creeping in to his voice, "you wouldn't be from Texas
+would you", I asked my face the definition of sincerity, (said in the voice
+of the wanna-be HBO director on the HBO made for TV commercial) "why yes....
+yes I am" he replied. Mind you a crowd had formed by this time, that poor guy
+never understood what was so funny about being from Texas or what a 'Damm
+Texan' register was.
+-Jan-93  19:53:50
+This 'fix' actually did work some what, the only problem was that no one told
+the guy (Von Ertwine) who was developing CP/M at home (consultant). Von had
+wisely chosen not to try to follow all of the current Revs of the 8563,
+instead he latched onto a somewhat working Rev4 and kept if for software
+development.  Later we would find out that Von, to make the 8563 work
+properly, was taking the little metal cup that came with his hot air popcorn
+popper (it was a buttercup to be exact) and would put an Ice cube in it and
+set it on the 8563.  He got about 1/2 hour of operation per cube. On our side
+there was talk of rigging cans of cold spray with foot switches for the CES
+show, "sparkle??? I don't <pissshhh> see any sparkle <pissshhh>". Anyways,
+no-one told Von.... but don't worry, he would find out the day before CES
+during setup in 'Vegas.
+-Oct-93  16:57:43
+Sb: C128, The Final Chapter
+Thought I'd finish what I'd started back in January of this year. I had been
+talkin 'bout how busted up the 8563, now we get to the part about how it got
+fixed... well fixed good enough... well patched good enough to give every
+possible attempt at the appearance of maybe passably working...
+One of the things that got worse instead of better was something called the
+back bias generator.  Now as much as I admired the blind ambition (as opposed
+to unmitigated gall... no one ever said it was unmitigated gall and I am not
+saying that here and now) of slipping in a major change like that right
+before a CES show, it became obvious that it needed fixed.  Now the back-bias
+generator connects to the substrate of the chip and if you've ever seen the
+ceramic versions of the 40 and 48 pin chips you would notice that the pin 1
+indicator notch is gold colored.  That is actually a contact to the
+substrate.  I have never heard of anyone ever soldering to the pin 1
+indicator notch but I had little to lose.  At this point all I did have to
+lose was a HUGE jar of bad 8563's. (One night a sign in my handwriting
+"appeared" on this jar asking "Guess how many working 8563's there are in the
+jar and win a prize."  Of course if the number you guessed was a positive
+real number you were wrong.) I soldered a wire between this tab and the
+closet ground pin.  The left column reappeared though still a little broken
+up!  The "EADY" prompt now proudly stated that the machine was "READY" and not
+really proclaiming it's desire to be known as the shortened version of
+Edward.  To  fix the remaining tearing we put 330 ohm pullups on the outputs
+and adjusted the power supply to 5.3 volts.  This is the equivalent of
+letting Tim-the-Tool-Man-Taylor soup up your blender with a chainsaw motor
+but it worked.  The side effect was that it would limit the useful life of
+the part to days instead of weeks as was the normal Commodore Quality
+Standard.  I was afraid that this fix might be deemed worthy for production.
+(said with the kind of sardonic cynical smile that makes parole officers
+really hate their jobs)
+Remember the synchronicity problem?  Remember the revolving door analogy?  We
+built a tower for the VIC chip that had something called a Phase Lock Loop on
+it which basically acted as a frequency doubler.  This took the 8.18 Mhz Dot
+Clock (I think it was 8.18 Mhz.... been too long and too many other dot clock
+frequencies since then) and doubled it.  We then ran a wire over to the 8563
+and used this new frequency in place of its own 16 Mhz clock.  Now this is
+equivalent to putting a revolving door at the other end of the room from the
+first door and synchronizing them so that they turn at the same rate.  Now if
+you get through the first door and walk at the right speed every time towards
+the second door you will probably get through.  This tower working amounted
+to a True Miracle and was accompanied by the sound of Hell Freezing over, the
+Rabbit getting the Trix, and several instances of Cats and Dogs sleeping
+together. This was the first time that making CES became a near possibility.
+We laughed, we cried, we got drunk.  So much in hurry were we that the little
+" X 3" PCB was produced in 12 hours (a new record) and cost us about $1000
+each.
+A new problem cropped up with sparkle in multi-colored character mode when
+used for one of the C64 game modes.  Getting all too used to this type of
+crises , I try a few things including adjusting the power supply to 4.75
+volts.  Total time-to-fix, 2 minutes 18 seconds, course now the 80 column
+display was tearing again. Machines are marked as to whether they can do 40
+column mode, 80 column mode or both. We averaged 1-3 of these crises a day
+the last two weeks before CES.  Several of us suffered withdrawal symptoms if
+the pressure laxed for even a few minutes.  The contracted security guards
+accidentally started locking the door to one of the development labs during
+this time.  A hole accidentally appeared in the wall allowing you to reach
+through and unlock it.  They continued to lock it anyways even though the
+gaping hole stood silent witness to the ineffectiveness of trying to lock us
+out of our own lab during a critical design phase.  We admired this
+singleness of purpose and considered changing professions.
+We finished getting ready for CES about 2:00 in the morning of the day we
+were to leave at 6:00.  On the way to catch the couple of hours sleep I hear
+the Live version of Solsbury Hill by Peter Gabriel, the theme song of the
+C128 Animals and take this as a good omen. Several hapless Programmers are
+spared the ritual sacrifice this night... little do they know they owe their
+lives to some unknown disc jockey.
+Advertisements in the Las Vegas airport and again on a billboard enroute from
+the airport inform us that the C128 has craftily been designed to be
+expandable to 512K.  Now it had been designed to be expandable originally and
+had been respecified by management so as to not be expandable in case next
+year's computer needed the expendability as the "New" reason to buy a
+Commodore computer. That's like not putting brakes on this years model of car
+so that next year you can tote the New model as reducing those annoying
+head-on crashes.
+Upon arriving at the hotel we find that out hotel reservations have been
+canceled by someone who fits the description of an Atari employee.  Three
+things occur in rapid succession.  First I find the nearest person owning a
+credit card and briskly escort her to the desk were I rented a room for all
+available days, second, a phone call is placed to another nearby hotel
+canceling the room reservations for Jack Trameil and company, third, several
+of those C64's with built in monitors (C64DX's??? man it's been too long) are
+brought out and left laying around the hotel shift supervisors path
+accompanied by statements such as "My my, who left this nifty computer laying
+here... I'd bet they wouldn't miss it too much".
+The next day we meet up with the guy who developed CPM (Von) for the C128.
+As I mentioned earlier, someone forgot to tell him about the silly little
+ramifications of an 8563 bug.  His 'puter didn't do it as he had stopped
+upgrading 8563s on his development machine somewhere around Rev 4 and the
+problem appeared somewhere around Rev 6.  As Von didn't carry all the
+machinery to do a CPM rebuild to fix the bug in software, it looked like CPM
+might not be showable.  One third of the booth's design and advertising was
+based on showing CPM.  In TRUE Animal fashion Von sat down with a disk editor
+and found every occurrence of bad writes to the 8563 and hand patched them.
+Bear in mind that CPM is stored with the bytes backwards in sectors that are
+stored themselves in reverse order. Also bear in mind that he could neither
+increase or decrease the number of instructions, he could only exchange them
+for different ones.  Did I mention hand calculating the new checksums for the
+sectors?  All this with a Disk Editor. I was impressed.
+Everything else went pretty smooth, every supply was adjusted at the last
+moment for best performance for that particular demo.  One application has
+reverse green (black on green) and the 330 ohm pullups won't allow the
+monitor to turn off fast enough for the black characters.  I had had
+alternate pullup packs made up back in West Chester and put them in to
+service.  On the average,2 almost working 8563's would appear each day, hand
+carried by people coming to Vegas. Another crisis, no problem, this was
+getting too easy.  If a machine started to sparkle during the demo, I would
+pull out my ever present tweak tool and give a little demonstration as to the
+adjustability of the New Commodore power supplies.  People were amazed by
+Commodore supplies that worked, much less had a voltage adjustment and an
+externally accessible fuse.  I explained (and meant it) that real bad power
+supplies with inaccessible fuses were a thing of Commodore's past and that
+the New design philosophy meant increased quality and common sense.
+I'm told they removed the fuse access from production units the month after I
+left Commodore.
+The names of the people who worked on the PCB layout can be found on
+the bottom of the PCB.
+"RIP: HERD, FISH, RUBINO"
+The syntax refers to an inside joke where we supposedly gave our lives in an
+effort to get the FCC production board done in time, after being informed
+just the week before by a middle manager that all the work on the C128 must
+stop as this project has gone on far too long.  After the head of Engineering
+got back from his business trip and inquired as to why the C128 had been put
+on hold, the middle manger nimbly spoke expounding the virtues of getting
+right on the job immediately and someone else, _his_ boss perhaps, had made
+such an ill suited decision.  The bottom line was we lived in the PCB layout
+area for the next several days.  I slept there on an airmatress or was
+otherwise available 24 hours a day to answer any layout questions.  The
+computer room was so cold that the Egg Mcmuffins we bought the first day were
+still good 3 days later.
+About the Z80:
+What court ordered Commodore to install the Z80?
+It wasn't mandated by court order, it was mandated by a 23 year old
+engineer that realized that marketing had gone and said that we were 100%
+compatible.  This turned out to be a hard nut to crack as no-one knew
+what C64 compatibility meant.  Companies who designed cartridges for the
+C64 used glitches to clock their circuitry not realizing that the glitches
+were not to be depended on, etc.
+The Z80/CPM cartridge didn't work on all C64's, and no-one had really taken
+the time to figure out why.  Someone noticed that a certain brand of the
+address buffer used in the CPM cart worked better than others so someone
+concluded that it must be the timing parameters that made a difference.
+This wasn't true, it was a very subtle problem that dealt with the way the
+, the Z80 and the DRAM had been interlaced together.  So here we had
+a CPM cart that didn't work with all C64's and it worked even
+less reliably with the C128 even though the timing parameters in the C128
+were far better.  In my opinion you couldn't call the C128 compatible
+with the CPM cart as it only ran 20% of the time when tested overnight.
+ALSO, I worked hard to make sure the C128 had a reliable power supply.  I
+was told "no fuse'..... oops one got in there by accident... in fact it
+was easily accessible... darn it anyway.  However, with the wide
+variations in minimum and maximum power supply requirements we couldn't
+handle the CPM cart, it needed an additional .5 amp because of some
+wasteful power techniques that were used in it.  I couldn't foot the
+bill for an additional .5 amp that might only occasionally be used.
+SO, with that said, I accidentally designed the Z80 into the next rev of
+the board.  We designed the C128 in 6 months from start to finish
+INCLUDING custom silicon, these were records back then, the Z80 was added
+around the second month.
+The Z80 normally calls for a DRAM cycle whenever it needs one... it might
+go 3 clocks and then 4 and then 6 and then 5 and then 7  between dram
+accesses. Since the processor shares the bus with the VIC chip there are
+only certain time when the bus is available for a DRAM cycles.  Since the
+shortest cycle for a DRAM access for the Z80 is 3 clock cycles, you are
+sure to catch a DRAM access if you do 2 cycles (wait for vic) then 2
+cycles (wait for vic).  Whether you catch the Z80 between clock 1 and 2 or
+between 2 and 3 doesn't matter due to the special circuitry in the design.
+Otherwise if you just let the Z80 rip it crashes when it tries to grab
+DRAM while there is a video(vic) cycle going on.  And that's why it runs
+at a clock-stretched 2MHz.  The REAL bitch was the Ready circuitry when
+flipping between DMA/6502/Z80.
+The C128 design team: SYS32800,123,45,6
+Bil Herd       Original design and Hardware team leader.
+Dave Haynie    Integration, timing analysis, and all those dirty
+              jobs involving computer analysis which was something
+              totally new for CBM.
+Frank Palaia   One of three people in the world who honestly knows
+              how to make a Z80 and a 6502 live peacefully with
+              each other in a synchronous, dual video controller,
+              time sliced, DRAM based system.
+Fred Bowen     Kernal and all system like things.  Dangerous when
+              cornered.  Has been known to  brandish common sense
+              when trapped.
+Terry Ryan     Brought structure to Basic and got in trouble for
+              it. Threatened with the loss of his job if he ever
+              did anything that made as much sense again.  Has
+              been know to use cynicism in ways that violate most
+              Nuclear Ban Treaties.
+Von Ertwine    CPM.  Sacrificed his family's popcorn maker in the
+              search of a better machine.
+Dave DiOrio    VIC chip mods and IC team leader.  Ruined the theory
+              that most chip designers were from Pluto.
+Victor         MMU integration. Caused much dissention by being one
+              of the nicest guys you'd ever meet.
+Greg Berlin    1571 Disk Drive design.  Originator of Berlin-Speak.
+              I think of Greg every night.  He separated my
+              shoulder in a friendly brawl in a bar parking lot
+              and I still cant sleep on that side.
+Dave Siracusa  1571 Software.  Aka "The Butcher"
+Not to mention the 8563 designers who made this story possible.
+.......
+....
+..
+.                                    C=H #17
+				      -fin-
+</code>