no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+<code>
+                   ########
+             ##################
+         ######            ######
+      #####
+    #####  ####  ####      ##      #####   ####  ####  ####  ####  ####   #####
+  #####    ##    ##      ####    ##   ##   ##  ###     ##    ####  ##   ##   ##
+ #####    ########     ##  ##   ##        #####       ##    ## ## ##   ##
+#####    ##    ##    ########  ##   ##   ##  ###     ##    ##  ####   ##   ##
+#####  ####  ####  ####  ####  #####   ####  ####  ####  ####  ####   ######
+#####                                                                    ##
+ ######            ######           Issue #16
+   ##################            April 26, 1998
+       ########
+...............................................................................
+            Knowledge is power. -- Nam et ipsa scientia potestas est.
+	                Francis Bacon, Meditationes Sacrae
+...............................................................................
+</code>
+====== BSOUT ======
+<code>
+	The number 16 is an important number for Commodore folks.
+Commodore's most famous computer is so named for having 2^16 bytes.
+I'm sure many of us were greatly influenced by Commodore as 16-year
+olds.  And it was just a little over sixteen years ago that the
+Commodore 64 first became available.
+	I want to convey to you what a remarkable fact this is.
+After sixteen years the Commodore 64 is not only still being used,
+but the entire community is still moving forwards.  People are still
+using the machine seriously, new and innovative hardware and software
+is still being developed, and new and innovative uses for these old
+machines are still being found.  I do not believe any other computer
+community can make this claim.  How cool is that?
+	Thus does issue #16 boldly stride forwards into the deep
+realms of the Commodore unknown, secure in the knowledge that the
+Commodore community will blaze brightly well into the future, and
+in eager anticipation of the undiscovered algorithms and schematics
+which lie just around the corner.
+	And now a few words on the future of C=Hacking.  As everyone
+knows, C=Hacking has been pining for the fjords for a while now.
+Now that it is once again displaying its beautiful plumage, I hope
+to keep new issues appearing reasonably regularly.  My current
+thinking is that C=Hacking will appear on a "critical mass" basis:
+once enough articles have arrived, a new issue will be released.
+I will of course be meanwhile digging around for material and authors,
+and we'll see if four issues per year is too optimistic  (one caveat:
+I will be trying to graduate soon, so you might have to wait a
+little bit longer than normal for the next issue or two).
+	I also expect to slim the issues down somewhat.  The focus
+will now be technical -- as Jim mentioned in issue #15, the nontechnical
+stuff will move to another mag.  Instead of a behemoth magazine with
+tons of articles, I am going to try using a critical mass of 2-3 main
+articles plus some smaller articles.  (This might also make it
+possible to release issues more often).
+	The magazine now has four sections: Jiffies, The C=Hallenge,
+Side Hacking, and the main articles.  Jiffies are just short, quickie
+and perhaps quirky programs, in the flavor of the old RUN "Magic" column,
+or "Bits & Pieces" from The Transactor.  The C=Hallenge presents a
+challenge problem for readers to submit solutions to, to be published
+in future issues.  Side Hacking is for articles which are too small to
+be main articles but nifty in their own right.  Thus there is now room
+for articles of all sizes, from monstrous to a mere screenful.  With the
+first two sections I am hoping to stimulate reader involvement, and
+we'll see if it works or not.  In this issue I have included at least one
+of each type of article, to give a flavor of the different sections.
+	Otherwise, things ought to be more or less the same.  I'd like
+to thank Jim Brain for keeping C=Hacking going over the last few years,
+and also for the use of the jbrain.com web site.  I'd also like to say
+that although this issue seems to be the "Steve and Pasi issue", there's
+no particular reason to expect that to be the case for future issues.
+	And now... onwards!
+.......
+....
+..
+.                                     C=H
+</code>
+====== Contents ======
+<code>
+BSOUT
+	o Voluminous ruminations from your unfettered editor.
+Jiffies
+	o Quick and nifty.
+The C=Hallenge
+	o All is now clear.
+Side Hacking
+	o "PAL VIC20 goes NTSC", by Timo Raita <vic@iki.fi> and
+   	   Pasi 'Albert' Ojala <albert@cs.tut.fi>.  How to turn your
+	   PAL VIC20 into an NTSC VIC20.
+	o "Starfields", by S. Judd <sjudd@nwu.edu>.  Making a simple
+	   starfield.
+	o "Milestones of C64 Data Compression", by Pontus Berg
+	  <bacchus@fairlight.org>.  A short history of data compression
+	  on the 64.
+Main Articles
+	o "Compression Basics", by Pasi 'Albert' Ojala <albert@cs.tut.fi>.
+	  Part one of a two-part article on data compression, giving an
+	  introduction to and overview of data compression and related
+	  issues.
+	o "3D for the Masses: Cool World and the 3D Library", by S. Judd
+	  <sjudd@nwu.edu>. The mother of all 3d articles.
+.................................. Stuff ....................................
+Legalities
+----------
+	Rather than write yet another flatulent paragraph that nobody will
+ever read, I will now make my own little contribution to dismantling
+the 30 years' worth of encrustation that has led to the current Byzantine
+US legal system:
+	C=Hacking is a freely available magazine, involving concepts which
+may blow up your computer if you're not careful.  The individual authors
+hold the copyrights on their articles.
+Rather than procedure and language, I therefore depend on common sense and
+common courtesy.  If you have neither, then you probably shouldn't use
+a 64, and you definitely shouldn't be reading C=Hacking.  Please email
+any questions or concerns to chacking-ed@mail.jbrain.com.
+General Info
+------------
+-For information on subscribing to the C=Hacking mailing list, send email to
+	chacking-info@mail.jbrain.com
+-For general information on anything else, send email to
+	chacking-info@mail.jbrain.com
+-For information on chacking-info@mail.jbrain.com, send email to
+	chacking-info@mail.jbrain.com
+To submit an article:
+	- Invest $5 in a copy of Strunk & White, "The Elements of Style".
+	- Read it.
+	- Send some email to chacking-ed@mail.jbrain.com
+Note that I have a list of possible article topics.  If you have a topic
+you'd like to see an article on please email chacking@mail.jbrain.com
+and I'll see what I can do!
+.......
+....
+..
+.                                     C=H
+</code>
+====== Jiffies ======
+<code>
+Well folks, this issue's Jiffy is pretty lonely.  I've given it a few
+friends, but I hope you will send in some nifty creations and tricks
+of your own.
+$01 From John Ianetta, 76703.4244@compuserve.com:
+This C-64 BASIC type-in program is presented here without further comment.
+printchr$(147)
+poke55,138:poke56,228:clr
+a$="ibm":print"a$ = ";a$
+b$="macintosh"
+print:print"b$ = ";b$
+print:print"a$ + b$ = ";a$+b$
+poke55,.:poke56,160:clr:list
+$02 I am Commodore, here me ROR!
+First trick: performing ROR on a possibly signed number.  Don't blink!
+	CMP #$80
+	ROR
+Second trick: performing a cyclical left shift on an 8-bit number.
+	CMP #$80
+	ROL
+Oooohh!  Aaaaahhh!  Another method:
+	ASL
+	ADC #00
+........
+....
+..
+.                                     C=H
+</code>
+====== The C=Hallenge ======
+<code>
+The chacking challenge this time around is very simple: write a program
+which clears the screen.
+This could be a text screen or a graphics screen.  Naturally ? CHR$(147)
+is awfully boring, and the point is to come up with an interesting
+algorithm -- either visually or code-wise -- for clearing the screen.
+The purpose here is to get some reader involvement going, so submit your
+solutions to
+	chacking-ed@mail.jbrain.com
+and chances are awfully good that you'll see them in the next issue!
+(Source code is a huge bonus).  And, if you have a good C=Hacking
+C=Hallenge problem, by all means please send it to the above address.
+........
+....
+..
+.                                     C=H
+</code>
+====== Side Hacking: PAL VIC20 Goes NTSC ======
+<code>
+by Timo Raita <vic@iki.fi> http://www.iki.fi/vic/
+   Pasi 'Albert' Ojala <albert@cs.tut.fi> http://www.cs.tut.fi/~albert/
+</code>
+===== Introduction =====
+<code>
+   Recently Marko Mäkelä organized an order from Jameco's C= chip
+closeout sale, one of the items being a heap of 6560R2-101 chips. When
+we had the chips, we of course wanted to get some of our PAL machines
+running with these NTSC VIC-I chips. A couple of weeks earlier Richard
+Atkinson wrote on the cbm-hackers mailing list that he had got a 6560
+chip running on a PAL board. He used an oscillator module, because he
+couldn't get the 14.31818 MHz crystal to oscillate in the PAL VIC's
+clock generator circuit.
+   We checked the PAL and NTSC VIC20 schematic diagrams but couldn't
+notice a significant difference in the clock generator circuits. There
+seemed to be no reason why a PAL machine could not be converted into
+NTSC machine fairly easily by changing the crystal and making a couple
+of small changes. Adding a clock oscillator felt a somewhat desperate
+quick'n'dirty solution.
+   Note that some old television sets require you to adjust their
+vertical hold knob for them to show 60 Hz (NTSC) frame rate. Some
+recent pseudo-intelligent televisions only display one frame rate (50
+Hz in PAL-land). Multistandard television sets and most video monitors
+do display 60 Hz picture correctly. There is a very small chance that
+your display does not like the 60 Hz frame rate. Still, be careful if
+you haven't tried 60 Hz before.
+   You should also note that PAL and NTSC machines use different KERNAL
+ROM versions, 901486-06 is for NTSC and 901486-07 is for PAL. However,
+the differences are small. In an NTSC machine with a PAL ROM the
+screen is not centered, the 60 Hz timer is not accurate and the RS-232
+timing values are wrong.
+</code>
+===== The Story =====
+<code>
+    Timo:
+   At first I took a VIC20CR board (FAB NO. 251040-01) and just replaced
+the videochip and the crystal, and as you might suspect, it didn't
+work. I noticed that the two resistors in the clock generator circuit
+(R5 and R6) had a value of 470 ohms, while the schematics (both NTSC
+and PAL!) stated that they should be 330 ohms. I replaced those
+resistors, and also noticed the single 56 pF capacitor on the bottom
+side of the board. This capacitor was connected to the ends of the R5
+resistor and was not shown in the schematics. As you might guess, the
+capacitor prevents fast voltage changes, and thus makes it impossible
+to increase the frequency.
+   Is this capacitor present also on NTSC-board? Someone with such a
+board could check this out. I removed the capacitor, and now it works.
+I didn't test the board between these two modifications, but Pasi
+confirmed that you only need to remove the capacitor.
+    Pasi:
+   I first tried to convert my VIC20CR machine, because the clock circuit
+in it seemed identical to the one a newer NTSC machine uses, except of
+course the crystal frequency. Some of the pull-up resistors were
+different, but I didn't think it made any difference. Pull-up
+resistors vary a lot without any reason anyway. I replaced the video
+chip and the crystal, but I could not get it to oscillate. I first
+thought that the 7402 chip in the clock circuit just couldn't keep up
+and replaced it with a 74LS02 chip. There was no difference. The PAL
+crystal with a PAL VIC-I (6561) still worked.
+   I turned my eyes to my third VIC20, which is an older model with all
+that heat-generating regulator stuff inside. It has almost the same
+clock circuit as the old NTSC schematic shows. There are three
+differences:
+. The crystal is 14.31818 MHz for NTSC, 8.867236 MHz for PAL.
+. Two NOR gates are used as NOT gates to drop one 74S04 from the
+       design.
+. In PAL the crystal frequency is divided by two by a 7474
+       D-flipflop.
+   I could either put in a 28.63636 MHz crystal or I could use the
+.31818 MHz crystal and bypass the 7474 clock divider. I didn't have
+a 28 MHz crystal, so I soldered in the 14.31818 MHz crystal and bent
+the pin 39 (phi1 in) up from the 6560 video chip so that it would not
+be connected to the divided clock. I then soldered a wire connecting
+this pin (6560 pin 39) and the 14.31818 MHz clock coming from the 7402
+(UB9 pin 10). The machine started working.
+   I just hadn't any colors. My monitor (Philips CM8833) does not show
+NTSC colors anyway, but a multistandard TV (all new Philips models at
+least) shows colors as long as the VIC-I clock is close enough to the
+NTSC color clock. The oscillator frequency can be fine-adjusted with
+the trimmer capacitor C35. Just remember that using a metallic
+unshielded screwdriver is a bad idea because it changes the
+capacitance of the clock circuit (unless the trimmer is an insulated
+model). Warming has also its effect on the circuit capacitances so let
+the machine be on for a while before being satisfied with the
+adjustment. With a small adjustment I had colors and was done with
+that machine.
+   Then I heard from Timo that the CR model has a hidden capacitor on the
+solder side of the motherboard, probably to filter out upper harmonic
+frequencies (multiples of 4.433618 MHz). I decided to give the VIC20CR
+modification another try. I removed the 56 pF capacitor, which was
+connected in parallel with R5, and the machine still worked fine.
+   I then replaced the crystal with the 14.31818 MHz crystal and inserted
+the 6560 video chip. The machine didn't work. I finally found out that
+it was because I had replaced the original 7402 with 74LS02. When I
+replaced it with a 74S02, the machine started working. I just could
+not get the frequency right and thus no colors until I added a 22 pF
+capacitor in parallel with the capacitor C50 and the trimmer capacitor
+C48 to increase the adjustment range from the original 5..25 pF to
+..47 pF. The trimmer orientation didn't have any visible effect
+anymore. I had colors and was satisfied.
+   To check all possibilities, I also replaced the 74S02 with 7402 that
+was originally used in the circuit (not the same physical chip because
+I had butchered it while soldering it out). I didn't need the parallel
+capacitor anymore, although the trimmer adjustment now needed to be
+correct or I lost colors.
+   As I really don't need two NTSC machines, I then converted this
+machine back to PAL. I made the modifications backwards. I replaced
+the crystal and video chip and then was stumped because the machine
+didn't work. I scratched my head for a while but then remembered the
+extra capacitor I had removed. And surely, the machine started working
+when I put it back. Obviously, the machine had only worked without the
+capacitor because it had 74LS02 at the time. 74S02 and 7402 won't work
+without it. So, if you are doing an NTSC to PAL conversion, you need
+to add this capacitor.
+</code>
+===== Summary =====
+<code>
+    PAL VIC20 To NTSC
+   This is the older, palm-heating VIC20 model with the two-prong power
+connector and the almost-cube power supply.
+. Replace the 8.867236 MHz crystal with a 14.31818 MHz crystal
+. If UB9 is 74LS02, replace it with a 7402 (or 74S02 if you only use
+       NTSC)
+. Bend pin 39 from the 6560 video chip so that it does not go to the
+       socket.
+. Add a jumper wire from UB9 pin 10 to the video chip pin 39.
+. Adjust the trimmer capacitor C35 so that your multistandard
+       television shows colors.
+    PAL VIC20CR To NTSC
+. Replace the 4.433618 MHz crystal with a 14.31818 MHz crystal
+. If your machine has a capacitor in parallel with R5, remove it
+       (the parallel capacitor). The values in our machines were 56 pF.
+. If UB9 is 74LS02, replace it with a 7402 (or 74S02 if you only use
+       NTSC)
+. If necessary, increase the clock adjustment range by adding a
+       capacitor of 11..22 pF in parallel to C50 (15 pF in the
+       schematics, 5 pF in my machine) and C48 (the trimmer capacitor
+..20 pF). With 7402 you can do with a smaller capacitor or with
+       no additional capacitor at all.
+. Adjust C48 so that your multistandard television shows colors.
+Trouble-shooting
+     * There is no picture
+          + You have not removed (or added for NTSC) the 56 pF capacitor
+            - Remove/Add the capacitor
+          + Your clock crystal is broken
+            - Find a working one
+          + Your machine has a 74LS02 instead of 7402 (or 74S02)
+            - Replace the 74LS02 with a 7402
+          + You forgot to replace the PAL video chip with the NTSC chip
+            - Remove 6561 and insert 6560
+          + For older model: You forgot the clock wire and/or to turn up
+pin 39
+            - Connect 6560 pin 39 to UB9 pin 10
+     * There is picture, but it is scrolling vertically
+          + Your monitor or television does not properly sync to 60 Hz
+            frame rate
+            - Adjust the monitor/TV's vertical hold setting
+     * There is picture but no colors
+          + Your monitor or TV does not understand the color encoding
+            - Give up the color or get a better television
+          + The color clock frequency is slightly off
+            - Adjust the trimmer capacitor
+          + The color clock frequency adjustment range is not enough
+            - Add a 11-22 pF capacitor in parallel with the trimmer
+:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+</code>
+====== Starfields ======
+<code>
+by S. Judd
+	If you've ever played Elite or Star Raiders you've experienced
+a starfield.  The idea is to use stars to give a sense of motion
+while navigating through the universe.  In this article we'll derive
+a simple algorithm for making a starfield -- the algorithm I used
+in Cool World.
+	As usual, a good way to start is with a little thinking --
+in this case, we first need to figure out what the starfield problem
+even is!  Consider the motion of the stars.  As we move forwards, the
+stars should somehow move past us -- over us, to the side of us, etc.
+(A star coming straight at us wouldn't appear to move at all).  And
+something that moves needs a starting place and an ending place.
+Certainly they end at the screen boundaries, and since anything really
+far away looks like it's right in front of us they must start at
+(or near) the center of the screen.
+	So far so good: stars start near the center of the screen
+and move outwards to the edges.  What kind of path do they follow?
+Well it can't be a curve, right?  If it were a curve, then all
+stars would have to curve in exactly the same way, since there is
+a rotation symmetry (just tilting your head won't alter the shape
+of the path of the stars).  So it has to be a straight line.  Since
+stars start at the center of the screen and move outwards, we know
+that, wherever a star might be on the screen right now, it started
+in the center.  So, to get the path of the star's motion, we simply
+draw a line from the center of the screen through the star's current
+location, and move the star along that line.
+	Drawing lines is easy enough.  If the center of the screen
+is located at (xc, yc) then the equation of a line from the center
+to some point (x0, y0) is just
+	(x,y) = (xc, yc) + t*(x0-xc, y0-yc)
+You should read a vector equation like the above as
+	for t=blah to wherever step 0.0001
+	  x = xc + t*(x0-xc)
+	  y = yc + t*(y0-yc)
+	  plot(x,y)
+	next
+and you can see that when t=0
+	(x,y) = (xc, yc) i.e. the center of the screen
+and when t=1
+	(x,y) = (x0, y0) i.e. the current location.
+Thus, when t goes from 0 to 1 we hit all points in-between, and
+when t is a very small number (like 0.0001) we basically move from
+(xc,yc) to the point on the line right next to (xc,yc).
+	Great, now we know the path the stars take.  But _how_ do
+they move along that path?  We know from experience that things
+which are far away appear to move slowly, but when we're up close
+they move really fast (think of a jet plane, or the moon).  And
+we know that in this problem that stars which are far away are
+near the center of the screen, and stars which are nearby are out
+near the edges.  So, stars near the center of the screen ought to
+move slowly, and as they move outwards they ought to increase in
+speed.
+	Well, that's easy enough to do.  We already have an easy
+measure of how far away we are from the center:
+	dx = x0-xc
+	dy = y0-yc
+But this is the quantity we need to compute to draw the line.  All
+we have to do is move some fraction of (dx,dy) from our current
+point (x0,y0):
+	x = x0 + dx/8	;Note that we started at x0, not xc
+	y = y0 + dy/8
+where I just divided by 8 for the heck of it.  Dividing by a larger
+number will result in slower motion, and dividing by a smaller number
+will result in faster motion along the line.  But the important thing
+to notive in the above equations is that when a star is far away from
+the center of the screen, dx and dy are large and the stars move faster.
+	Well alrighty then.  Now we have an algorithm:
+	draw some stars on the screen
+	for each star, located at (x,y):
+	  erase old star
+	  compute dx = x-xc and dy = y-yc
+	  x = x + dx*velocity
+	  y = y + dy*velocity
+	  plot (x,y)
+	keep on going!
+where velocity is some number like 1/8 or 1/2 or 0.14159 or whatever.
+This is enough to move us forwards (and backwards, if a negative value
+of velocity is used).  What about moving up and down, or sideways?
+What about rotating?
+	First and foremost, remember that, as we move forwards, stars
+always move in the same way: draw a straight line from the origin
+through the star, and move along that path.  And that's the case
+no matter where the star is located.  This means that to rotate or
+move sideways, we simply move the position of the stars.  Then
+using the above algorithm they will just move outwards from the
+center through their new position.  In other words, rotations and
+translations are pretty easy.
+	Sideways translations are easy: just change the x-coordinates
+(for side to side translations) or y-coordinates (for up and down
+translations).  And rotations are done in the usual way:
+	x = x*cos(t) - y*sin(t)
+	y = x*sin(t) + y*cos(t)
+where t is some rotation angle.  Note that you can think of sideways
+motion as moving the center of the screen (like from 160,100 to 180,94).
+	Finally, what happens when stars move off the screen?  Why,
+just plop a new star down at a random location, and propagate it along
+just like all the others.  If we're moving forwards, stars move off
+the screen when they hit the edges.  If we're moving backwards, they
+are gone once they get near enough to the center of the screen.
+	Now it's time for a simple program which implements these
+ideas.  It is in BASIC, and uses BLARG (available in the fridge)
+to do the graphics.  Yep, a SuperCPU comes in really handy for
+this one.  Rotations are also left out, and I put little effort
+into fixing up some of the limitations of the simple algorithm
+(it helps to see them!).  For a complete ML implementation see the
+Cool World source code.
+rem starfield
+mode16:gron16
+dim x(100),y(100):a=rnd(ti)
+xc=160:yc=100:n=12:v=1/8
+for i=1 to n:gosub200:next
+:
+for i=1 to n
+color0:plot x(i),y(i):color 1
+dx=x(i)-xc:dy=y(i)-yc
+x1=x(i)+v*dx+x0:y1=y(i)+v*dy+y0
+x2=abs(x1-x(i)):y2=abs(y1-y(i)):if (x2+y2<3*abs(v)) then gosub 200:goto 60
+if (x1<0) or (y1<0) or (x1>319) or (y1>199) then gosub 200:dx=0:dy=0:goto65
+plot x1,y1:x(i)=x1:y(i)=y1
+next
+get a$
+if a$=";" then x0=x0-3:goto 40
+if a$=":" then x0=x0+3:goto 40
+if a$="@" then y0=y0-3:goto 40
+if a$="/" then y0=y0+3:goto 40
+if a$="a" then v=v+1/32
+if a$="z" then v=v-1/32
+if a$<>"q" then 40
+groff:stop
+x(i)=280*rnd(1)+20:y(i)=170*rnd(1)+15:return
+Line 200 just plops a new star down at a random position between
+x=20..300 and y=15..185.  Lines 100-140 just let you "navigate"
+and change the velocity.  The main loop begins at line 50:
+erase old star
+compute dx and dy
+advance the star outward from the center of the screen
+check if the star is too close to the center of the screen (in case
+   moving backwards)
+if star has moved off the screen, then make a new star
+And that's the basic idea.  Easy!  It's also easy to make further
+modifications -- perhaps some stars could move faster than others, some
+stars could be larger that others, or different colors, and so on.
+Starfields are fast, easy to implement, and make a nifty addition to
+many types of programs.
+</code>
+====== Milestones of C64 Data Compression ======
+<code>
+Pontus Berg, bacchus@fairlight.org
+	One of the featured articles in this issue is on data compression.
+A very natural question to ask is: what about data compression on the C-64?
+The purpose of this article is therefore to gain some insight into the
+history of data compression on the 64.  This article doesn't cover
+programs like ARC, which are used for storage/archiving, but instead
+focuses on programs which are compressed in executable format, i.e.
+decompress at runtime.
+	The earliest instance of compression comes with all computers:
+the BASIC ROMs.  As everyone knows, BASIC replaces keywords with
+one-byte tokens.  This not only takes less memory, but makes the
+program run faster as well.  Clearly, though, the BASIC interpreter
+is a very special situation.
+	The general problem of 64 data compression is to *reversibly*
+replace one chunk of data with a different chunk of data which is
+shorter *on the average*.  It won't always be shorter, simply because
+there are many more big chunks of data than there are small ones.
+The idea is to take advantage of any special structure in the data.
+The data in this case is 64 machine language programs, including
+graphics data, tables, etc.
+	The early years didn't feature any compression, but then
+"packers" (RLE compression) were introduced.  The first packer I used
+myself was flash packer, but I can't tell if it was an early one.  The
+idea of RLE -- run length encoding -- is simply to replace repeated
+bytes with a single byte and the number of times to repeat it.  For
+example, AAAAAA could be replaced by xA6, where x is a control character
+to tell the decompressor that the next two bytes represent a run.
+	Then came the TimeCruncher, in 1986 or perhaps 1987.  Basically,
+one can divide the world into compression before and after the TimeCruncher
+by Macham/Network.  With TimeCruncher, Matcham introduced sequence
+crunching -- this is THE milestone in the evolution.  The idea of
+sequencing is the same as LZ77: replace repeated byte sequences with a
+*reference* to earlier sequences.  As you would imagine, short sequences,
+especially of two or three bytes, are very common, so methods of handling
+those cases tend to pay large dividends.  See Pasi's article for a
+detailed example of LZ77 compression.  It is worth noting that several
+compression authors were not aware of LZ77 when designing their programs!
+	Naturally the 64 places certain limitations on a compression
+algorithm.  With typical 64 crunchers you define a scanning range,
+which is the range in which the sequence scanner looks for references.
+The sequence cruncher replaces parts of the codes with references to
+equal sequences in the program.  References are relative: number of
+bytes away and length.  Building this reference in a smart and efficient
+way is the key to success.  References that are far away require more
+bits, so a "higher speed" (bigger search area) finds more sequences,
+but the references are longer.
+	The next step of value was CruelCrunch, where Galleon (and
+to some extent Syncro) took the concept to where it could be taken.
+Following that, the next step was introducing crunchers which use
+the REU, where Antitrack took DarkSqeezer2 and modified it into a
+REU version.  Actually this was already possible in the original
+Darksqeezer 3, but that was not available to ATT.  Alex was pissing
+mad when it leaked out (rather soon) ;-).
+	The AB Cruncher, followed by ByteBoiler, was really the first
+cruncher to always beat the old CruelCrunch.  With the REU feature and
+public availablility, this took the real power of crunchers to all users.
+It should be noted that the OneWay line of crunchers (ByteBoiler and
+AB Crunch, which doesn't even require an REU) actually first scan the
+files and then optimize their algorithms to perform their best, rather
+than letting the user select a speed and see the results after each.
+	Regarding compression times, a char/RLE packer typically takes
+the time it takes to read a file twice, as they usually feature two
+passes -- one for determining the bytes to use as controlbytes and
+one to create the packed file.  Sequence crunchers like TimeCruncher
+typically took a few hours, and CruelCrunch as much as ten hours
+(I always let it work over night so I can't tell for sure - it's not
+something you clock while watching ;-).  After the introduction of
+REU-based sequence crunchers (which construct tables of the memory
+contents and do a table lookup, rather than repeatedly scanning the data),
+and their subsequent optimization, the crunching times went down first to
+some 30 minutes and then to a few minutes.  ByteBoiler only takes some two
+minutes for a full 200 block program, as I recall.
+	The RLE packers and sequence crunchers are often combined. One
+reason was the historical time saving argument - a file packed first
+would be smaller upon entering the crunching phase which could hence
+be completed much faster.  A very sophistocated charpacker is however
+a big waste as the result they produce - a block or so shorther at
+best - is almost always eaten up by worse crunching.  Some argue that
+you could as well crucnh right away without first using the packer, but
+then again you have the other argument - a charpacker can handle a file
+of virtually any length (0029 to ffff is available) whereas normally
+a cruncher is slightly more limited.
+	Almost any game or demofile (mixed contents of graphics, code,
+musicdata and player, etc.) normally packs into some 50-60% of its original
+size when using an RLE+sequence combination.  The packer might compress
+the file by some 30%, depending on the number of controlbytes and such,
+and the cruncher can compress it an additional 10-20%.
+	Minimising the size was the key motivation for the development of
+these programs -- to make more fit on the expensive disks and to make them
+load faster from any CBM device (we all know the speed of them ;-).
+For the crackers one could mention the levelcrunchers as well.  This is a
+way to pack data and have it depack transparently while loading the data,
+as opposed to adding a depacker to be run upon execution.  The very
+same crunching algorithms are used, and the same programs often come in
+a level- and filecrunching edition.
+.......
+....
+..
+.                                     C=H
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+                               Featured Articles
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+</code>
+====== Compression Basics ======
+<code>
+Pasi 'Albert' Ojala	albert@cs.tut.fi
+			http://www.cs.tut.fi/%7Ealbert/
+</code>
+===== Introduction =====
+<code>
+     Because real-world files usually are quite redundant,
+compression can often reduce the file sizes considerably.  This in
+turn reduces the needed storage size and transfer channel capacity.
+Especially in systems where memory is at premium compression can make
+the difference between impossible and implementable.  Commodore 64
+and its relatives are good examples of this kind of a system.
+     The most used 5.25-inch disk drive for Commodore 64 only holds
+kB of data, which is only about 2.5 times the total random access
+memory of the machine.  With compression, many more programs can fit
+on a disk.  This is especially true for programs containing flashy
+graphics or sampled sound.  Compression also reduces the loading
+times from the notoriously slow 1541 drive, whether you use the
+original slow serial bus routines or some kind of a disk turbo loader
+routine.
+     Dozens of compression programs are available for Commodore 64.
+I leave the work to chronicle the history of the C64 compression
+programs to others and concentrate on a general overview of the
+different compression algorithms.  Later we'll take a closer look on
+how the compression algorithms actually work and in the next article
+I will introduce my own creation:  pucrunch.
+     Pucrunch is a compression program written in ANSI-C which
+generates files that automatically decompress and execute themselves
+when run on a C64 (or C128 in C64-mode, VIC20, or C16/+4).  It is a
+cross-compressor, if you will, allowing you to do the real work on
+any machine, like a cross-assembler.
+     Our target environment (Commodore 64 and VIC20) restricts us
+somewhat when designing the 'ideal' compression system.  We would
+like it to be able to decompress as big a program as possible.
+Therefore the decompression code must be located in low memory, be as
+short as possible, and must use very small amounts of extra memory.
+     Another requirement is that the decompression should be
+relatively fast, which means that the arithmetic used should be
+mostly 8- or 9-bit which is much faster than e.g.  16-bit arithmetic.
+Processor- and memory-intensive algorithms are pretty much out of the
+question.  A part of the decompressor efficiency depends on the
+format of the compressed data.  Byte-aligned codes can be accessed
+very quickly; non-byte-aligned codes are much slower to handle, but
+provide better compression.
+     This is not meant to be the end-all document for data
+compression.  My intention is to only scratch the surface and give
+you a crude overview.  Also, I'm mainly talking about lossless
+compression here, although some lossy compression ideas are briefly
+mentioned.  A lot of compression talk is available in the world wide
+web, although it may not be possible to understand everything on the
+first reading.  To build the knowledge, you have to read many
+documents and understand _something_ from each one so that when you
+return to a document, you can understand more than the previous time.
+It's a lot like watching Babylon 5.  :-)
+     Some words of warning:  I try to give something interesting to
+read to both advanced and not so advanced readers.  It is perfectly
+all right for you to skip all uninteresting details.  I start with a
+Huffman and LZ77 example so you can get the basic idea before
+flooding you with equations, complications, and trivia.
+</code>
+===== Huffman and LZ77 Example =====
+<code>
+     Let's say I had some simple language like "Chippish" containing
+only the letters _CDISV_.  How would a string like
+    _SIDVICIIISIDIDVI_
+compress using a) Huffman encoding, and b) LZ77?  How do compression
+concepts such as information entropy enter into this?
+     A direct binary code would map the different symbols to
+consequtive bit patterns, such as:
+    Symbol  Code
+    'C'     000
+    'D'     001
+    'I'     010
+    'S'     011
+    'V'     100
+     Because there are five symbols, we need 3 bits to represent all
+of the possibilities, but we also don't use all the possibilities.
+Only 5 values are used out of the maximum 8 that can be represented
+in 3 bits.  With this code the original message takes 48 bits:
+    SIDVICIIISIDIDVI ==
+010 001 100 010 000 010 010 010 011 010 001 010 001 100 010
+     For Huffman and for entropy calculation (entropy is explained in
+the next chapter) we first need to calculate the symbol frequencies
+from the message.  The probability for each symbol is the frequency
+of appearance divided by the message length.  When we reduce the
+number of bits needed to represent the probable symbols (their code
+lengths) we can also reduce the average code length and thus the
+number of bits we need to send.
+    'C'     1/16    0.0625
+    'D'     3/16    0.1875
+    'I'     8/16    0.5
+    'S'     2/16    0.125
+    'V'     2/16    0.125
+     The entropy gives the lower limit for a statistical compression
+method's average codelength.  Using the equation from the next
+section, we can calculate it as 1.953.  This means that however
+cleverly you select a code to represent the symbols, in average you
+need at least 1.953 bits per symbol.  In this case you can't do
+better than 32 bits, since there are a total of 16 symbols.
+     Next we create the Huffman tree.  We first rank the symbols in
+decreasing probability order and then combine two lowest-probability
+symbols into a single composite symbol (C1, C2, ..).  The probability
+of this new symbol is therefore the sum of the two original
+probabilities.  The process is then repeated until a single composite
+symbol remains:
+    Step 1         Step 2        Step 3         Step 4
+    'I' 0.5        'I' 0.5       'I' 0.5        C3  0.5\C4
+    'D' 0.1875     C1  0.1875    C2  0.3125\C3  'I' 0.5/
+    'S' 0.125      'D' 0.1875\C2 C1  0.1875/
+    'V' 0.125 \C1  'S' 0.125 /
+    'C' 0.0625/
+     Note that the composite symbols are inserted as high as
+possible, to get the shortest maximum code length (compare C1 and 'D'
+at Step 2).
+     At each step two lowest-probability nodes are combined until we
+have only one symbol left.  Without knowing it we have already
+created a Huffman tree.  Start at the final symbol (C4 in this case),
+break up the composite symbol assigning 0 to the first symbol and 1
+to the second one.  The following tree just discards the
+probabilities as we don't need them anymore.
+              C4
+/ \ 1
+            /  'I'
+           C3
+/ \ 1
+         /   \
+        C2   C1
+/ \1 0/ \ 1
+     'D''S' 'V''C'
+    Symbol  Code  Code Length
+    'C'     011   3
+    'D'     000   3
+    'I'     1     1
+    'S'     001   3
+    'V'     010   3
+     When we follow the tree from to top to the symbol we want to
+encode and remember each decision (which branch to follow), we get
+the code:  {'C', 'D', 'I', 'S', 'V'} = {011, 000, 1, 001, 010}.  For
+example when we see the symbol 'C' in the input, we output 011.  If
+we see 'I' in the input, we output a single 1.  The code for 'I' is
+very short because it occurs very often in the input.
+     Now we have the code lengths and can calculate the average code
+length:  0.0625*3+0.1875*3+0.5*1+0.125*3+0.125*3 = 2.  We did not
+quite reach the lower limit that entropy gave us.  Well, actually it
+is not so surprising because we know that Huffman code is optimal
+only if all the probabilities are negative powers of two.
+Encoded, the message becomes:
+    SIDVICIIISIDIDVI ==
+1 000 010 1 011 1 1 1 001 1 000 1 000 010 1
+     The spaces are only to make the reading easier.  So, the
+compressed output takes 32 bits and we need at least 10 bits to
+transfer the Huffman tree by sending the code lengths (more on this
+later).  The message originally took 48 bits, now it takes at least
+bits.
+     Huffman coding is an example of a "variable length code" with a
+"defined word" input.  Inputs of fixed size -- a single, three-bit
+letter above -- are replaced by a variable number of bits.  At the
+other end of the scale are routines which break the _input_ up into
+variably sized chunks, and replace those chunks with an often
+fixed-length _output_.  The most popular schemes of this type are
+Lempel-Ziv, or LZ, codes.
+     Of these, LZ77 is probably the most straightforward.  It tries
+to replace recurring patterns in the data with a short code.  The
+code tells the decompressor how many symbols to copy and from where
+in the output to copy them.  To compress the data, LZ77 maintains a
+history buffer, which contains the data that has been processed, and
+tries to match the next part of the message to it.  If there is no
+match, the next symbol is output as-is.  Otherwise an (offset,length)
+-pair is output.
+    Output           History Lookahead
+                             SIDVICIIISIDIDVI
+    S                      S IDVICIIISIDIDVI
+    I                     SI DVICIIISIDIDVI
+    D                    SID VICIIISIDIDVI
+    V                   SIDV ICIIISIDIDVI
+    I                  SIDVI CIIISIDIDVI
+    C                 SIDVIC IIISIDIDVI
+    I                SIDVICI IISIDIDVI
+    I               SIDVICII ISIDIDVI
+    I              SIDVICIII SIDIDVI
+                   ---       ---    match length: 3
+                   |----9---|       match offset: 9
+    (9, 3)      SIDVICIIISID IDVI
+                          -- --     match length: 2
+                          |2|       match offset: 2
+    (2, 2)    SIDVICIIISIDID VI
+                 --          --     match length: 2
+                 |----11----|       match offset: 11
+    (11, 2) SIDVICIIISIDIDVI
+     At each stage the string in the lookahead buffer is searched
+from the history buffer.  The longest match is used and the distance
+between the match and the current position is output, with the match
+length.  The processed data is then moved to the history buffer.
+Note that the history buffer contains data that has already been
+output.  In the decompression side it corresponds to the data that
+has already been decompressed.  The message becomes:
+    S I D V I C I I I (9,3) (2,2) (11,2)
+    The following describes what the decompressor does with this data.
+    History                 Input
+                            S
+    S                       I
+    SI                      D
+    SID                     V
+    SIDV                    I
+    SIDVI                   C
+    SIDVIC                  I
+    SIDVICI                 I
+    SIDVICII                I
+    SIDVICIII               (9,3)   -> SID
+    |----9---|
+    SIDVICIIISID            (2,2)   -> ID
+              |2|
+    SIDVICIIISIDID          (11,2)  -> VI
+       |----11----|
+    SIDVICIIISIDIDVI
+     In the decompressor the history buffer contains the data that
+has already been decompressed.  If we get a literal symbol code, it
+is added as-is.  If we get an (offset,length) pair, the offset tells
+us from where to copy and the length tells us how many symbols to
+copy to the current output position.  For example (9,3) tells us to
+go back 9 locations and copy 3 symbols to the current output
+position.  The great thing is that we don't need to transfer or
+maintain any other data structure than the data itself.
+     Compare this to the BASIC interpreter, where all tokens have the
+high bit set and all normal characters don't (PETSCII codes 0-127).
+So when the LIST routine sees a normal character it just prints it
+as-is, but when it hits a special character (PETSCII >= 128) it looks
+up the corresponding keyword in a table.  LZ77 is similar, but an
+LZ77 LIST would look up the keyword in the data already LISTed to the
+screen!  LZ78 uses a separate table which is expanded as the data is
+processed.
+     The number of bits needed to encode the message (>52 bits) is
+somewhat bigger than the Huffman code used (42 bits).  This is mainly
+because the message is too short for LZ77.  It takes quite a long
+time to build up a good enough dictionary (the history buffer).
+</code>
+===== Introduction to Information Theory =====
+<code>
+Symbol Sources
+--------------
+     Information theory traditionally deals with symbol sources
+that have certain properties.  One important property is that they
+give out symbols that belong to a finite, predefined alphabet A.
+An alphabet can consist of for example all upper-case characters
+(A = {'A','B','C',..'Z',..}), all byte values (A = {0,1,..255}) or
+both binary digits (A = {0,1}).
+     As we are dealing with file compression, the symbol source is a
+file and the symbols (characters) are byte values from 0 to 255.  A
+string or a phrase is a concatenation of symbols, for example 011101,
+"AAACB".  Quite intuitive, right?
+     When reading symbols from a symbol source, there is some
+probability for each of the symbols to appear.  For totally random
+sources each symbol is equally likely, but random sources are also
+incompressible, and we are not interested in them here.  Equal
+probabilities or not, probabilities give us a means of defining the
+concept of symbol self-information, i.e.  the amount of information a
+symbol carries.
+     Simply, the more probable an event is, the less bits of
+information it contains.  If we denote the probability of a symbol
+A[i] occurring as p(A[i]), the expression -log2(p(A[i])) (base-2
+logarithm) gives the amount of information in bits that the source
+symbol A[i] carries.  You can calculate base-2 logarithms using
+base-10 or natural logarithms if you remember that log2(n) = log(n)/log(2).
+A real-world example is a comparison between the statements:
+. it is raining
+. the moon of earth has exploded.
+     The first case happens every once in a while (assuming we are
+not living in a desert area).  Its probability may change around the
+world, but may be something like 0.3 during bleak autumn days.  You
+would not be very surprised to hear that it is raining outside.  It
+is not so for the second case.  The second case would be big news, as
+it has never before happened, as far as we know.  Although it seems
+very unlikely we could decide a very small probability for it, like
+E-30.  The equation gives the self-information for the first case as
+.74 bits, and 99.7 bits for the second case.
+Message Entropy
+---------------
+     So, the more probable a symbol is, the less information it
+carries.  What about the whole message, i.e.  the symbols read from
+the input stream?
+     What is the information contents a specific message carries?
+This brings us to another concept:  the entropy of a source.  The
+measure of entropy gives us the amount of information in a message
+and is calculated like this:  H = sum{ -p(A[i])*log2(p(A[i])) }.  For
+completeness we note that 0*log2(0) gives the result 0 although
+log2(0) is not defined in itself.  In essence, we multiply the
+information a symbol carries by the probability of the symbol and
+then sum all multiplication results for all symbols together.
+     The entropy of a message is a convenient measure of information,
+because it sets the lower limit for the average codeword length for a
+block-variable code, for example Huffman code.  You can not get
+better compression with a statistical compression method which only
+considers single-symbol probabilities.  The average codeword length
+is calculated in an analogous way to the entropy.  Average code
+length is L = sum{-l(i)*log2(p(A[i])) }, where l(i) is the codeword
+length for the ith symbol in the alphabet.  The difference between L
+and H gives an indication about the efficiency of a code.  Smaller
+difference means more efficient code.
+     It is no coincidence that the entropy and average code length
+are calculated using very similar equations.  If the symbol
+probabilities are not equal, we can get a shorter overall message,
+i.e.  shorter _average_ codeword length (i.e.  compression), if we
+assign shorter codes for symbols that are more likely to occur.  Note
+that entropy is only the lower limit for statistical compression
+systems.  Other methods may perform better, although not for all
+files.
+Codes
+-----
+     A code is any mapping from an input alphabet to an output
+alphabet.  A code can be e.g.  {a, b, c} = {0, 1, 00}, but this code
+is obviously not uniquely decodable.  If the decoder gets a code
+message of two zeros, there is no way it can know whether the
+original message had two a's or a c.
+     A code is _instantaneous_ if each codeword (a code symbol as
+opposed to source symbol) in a message can be decoded as soon as it
+is received.  The binary code {a, b} = {0, 01} is uniquely decodable,
+but it isn't instantaneous.  You need to peek into the future to see
+if the next bit is 1.  If it is, b is decoded, if not, a is decoded.
+The binary code {a, b, c} = {0, 10, 11} on the other hand is an
+instantaneous code.
+     A code is a _prefix code_ if and only if no codeword is a prefix
+of another codeword.  A code is instantaneous if and only if it is a
+prefix code, so a prefix code is always a uniquely decodable
+instantaneous code.  We only deal with prefix codes from now on.  It
+can be proven that all uniquely decodable codes can be changed into
+prefix codes of equal code lengths.
+</code>
+===== 'Classic' Code Classification =====
+<code>
+Compression algorithms can be crudely divided into four groups:
+. Block-to-block codes
+. Block-to-variable codes
+. Variable-to-block codes
+. Variable-to-variable codes
+Block-to-block codes
+--------------------
+     These codes take a specific number of bits at a time from the
+input and emit a specific number of bits as a result.  If all of the
+symbols in the input alphabet (in the case of bytes, all values from
+to 255) are used, the output alphabet must be the same size as the
+input alphabet, i.e.  uses the same number of bits.  Otherwise it
+could not represent all arbitrary messages.
+     Obviously, this kind of code does not give any compression, but
+it allows a transformation to be performed on the data, which may
+make the data more easily compressible, or which separates the
+'essential' information for lossy compression.  For example the
+discrete cosine transform (DCT) belongs to this group.  It doesn't
+really compress anything, as it takes in a matrix of values and
+produces a matrix of equal size as output, but the resulting values
+hold the information in a more compact form.
+     In lossless audio compression the transform could be something
+along the lines of delta encoding, i.e.  the difference between
+successive samples (there is usually high correlation between
+successive samples in audio data), or something more advanced like
+Nth order prediction.  Only the prediction error is transmitted.  In
+lossy compression the prediction error may be transmitted in reduced
+precision.  The reproduction in the decompression won't then be
+exact, but the number of bits needed to transmit the prediction error
+may be much smaller.
+     One block-to-block code relevant to Commodore 64, VIC 20 and
+their relatives is nybble packing that is performed by some C64
+compression programs.  As nybbles by definition only occupy 4 bits of
+a byte, we can fit two nybbles into each byte without throwing any
+data away, thus getting 50% compression from the original which used
+a whole byte for every nybble.  Although this compression ratio may
+seem very good, in reality very little is gained globally.  First,
+only very small parts of actual files contain nybble-width data.
+Secondly, better methods exist that also take advantage of the
+patterns in the data.
+Block-to-variable codes
+-----------------------
+     Block-to-variable codes use a variable number of output bits for
+each input symbol.  All statistical data compression systems, such as
+symbol ranking, Huffman coding, Shannon-Fano coding, and arithmetic
+coding belong to this group (these are explained in more detail
+later).  The idea is to assign shorter codes for symbols that occur
+often, and longer codes for symbols that occur rarely.  This provides
+a reduction in the average code length, and thus compression.
+     There are three types of statistical codes:  fixed, static, and
+adaptive.  Static codes need two passes over the input message.
+During the first pass they gather statistics of the message so that
+they know the probabilities of the source symbols.  During the second
+pass they perform the actual encoding.  Adaptive codes do not need
+the first pass.  They update the statistics while encoding the data.
+The same updating of statistics is done in the decoder so that they
+keep in sync, making the code uniquely decodable.  Fixed codes are
+'static' static codes.  They use a preset statistical model, and the
+statistics of the actual message has no effect on the encoding.  You
+just have to hope (or make certain) that the message statistics are
+close to the one the code assumes.
+     However, 0-order statistical compression (and entropy) don't
+take advantage of inter-symbol relations.  They assume symbols are
+disconnected variables, but in reality there is considerable relation
+between successive symbols.  If I would drop every third character
+from this text, you would probably be able to decipher it quite well.
+First order statistical compression uses the previous character to
+predict the next one.  Second order compression uses two previous
+characters, and so on.  The more characters are used to predict the
+next character the better estimate of the probability distribution
+for the next character.  But more is not only better, there are also
+prices to pay.
+     The first drawback is the amount of memory needed to store the
+probability tables.  The frequencies for each character encountered
+must be accounted for.  And you need one table for each 'previous
+character' value.  If we are using an adaptive code, the second
+drawback is the time needed to update the tables and then update the
+encoding accordingly.  In the case of Huffman encoding the Huffman
+tree needs to be recreated.  And the encoding and decoding itself
+certainly takes time also.
+     We can keep the memory usage and processing demands tolerable by
+using a 0-order static Huffman code.  Still, the Huffman tree takes
+up precious memory and decoding Huffman code on a 1-MHz 8-bit
+processor is slow and does not offer very good compression either.
+Still, statistical compression can still offer savings as a part of a
+hybrid compression system.
+    For example:
+    'A'     1/2     0
+    'B'     1/4     10
+    'C'     1/8     110
+    'D'     1/8     111
+    "BACADBAABAADABCA"                              total: 32 bits
+0 110 0 111 10 0 0 10 0 0 111 0 10 110 0     total: 28 bits
+     This is an example of a simple statistical compression.  The
+original symbols each take two bits to represent (4 possibilities),
+thus the whole string takes 32 bits.  The variable-length code
+assigns the shortest code to the most probable symbol (A) and it
+takes 28 bits to represent the same string.  The spaces between
+symbols are only there for clarity.  The decoder still knows where
+each symbol ends because the code is a prefix code.
+     On the other hand, I am simplifying things a bit here, because
+I'm omitting one vital piece of information:  the length of the
+message.  The file system normally stores the information about the
+end of file by storing the length of the file.  The decoder also
+needs this information.  We have two basic methods:  reserve one
+symbol to represent the end of file condition or send the length of
+the original file.  Both have their virtues.
+     The best compressors available today take into account
+intersymbol probabilities.  Dynamic Markov Coding (DMC) starts with a
+zero-order Markov model and gradually extends this initial model as
+compression progresses.  Prediction by Partial Matching (PPM),
+although it really is a variable-to-block code, looks for a match of
+the text to be compressed in an order-n context and if there is no
+match drops back to an order n-1 context until it reaches order 0.
+Variable-to-block codes
+-----------------------
+     The previous compression methods handled a specific number of
+bits at a time.  A group of bits were read from the input stream and
+some bits were written to the output.  Variable-to-block codes behave
+just the opposite.  They use a fixed-length output code to represent
+a variable-length part of the input.  Variable-to-block codes are
+also called free-parse methods, because there is no pre-defined way
+to divide the input message into encodable parts (i.e.  strings that
+will be replaced by shorter codes).  Substitutional compressors
+belong to this group.
+     Substitutional compressors work by trying to replace strings in
+the input data with shorter codes.  Lempel-Ziv methods (named after
+the inventors) contain two main groups:  LZ77 and LZ78.
+Lempel-Ziv 1977
++++++++++++++++
+     In 1977 Ziv and Lempel proposed a lossless compression method
+which replaces phrases in the data stream by a reference to a
+previous occurrance of the phrase.  As long as it takes fewer bits to
+represent the reference and the phrase length than the phrase itself,
+we get compression.  Kind-of like the way BASIC substitutes tokens
+for keywords.
+     LZ77-type compressors use a history buffer, which contains a
+fixed amount of symbols output/seen so far.  The compressor reads
+symbols from the input to a lookahead buffer and tries to find as
+long as possible match from the history buffer.  The length of the
+string match and the location in the buffer (offset from the current
+position) is written to the output.  If there is no suitable match,
+the next input symbol is sent as a literal symbol.
+     Of course there must be a way to identify literal bytes and
+compressed data in the output.  There are lot of different ways to
+accomplish this, but a single bit to select between a literal and
+compressed data is the easiest.
+     The basic scheme is a variable-to-block code.  A variable-length
+piece of the message is represented by a constant amount of bits:
+the match length and the match offset.  Because the data in the
+history buffer is known to both the compressor and decompressor, it
+can be used in the compression.  The decompressor simply copies part
+of the already decompressed data or a literal byte to the current
+output position.
+     Variants of LZ77 apply additional compression to the output of
+the compressor, which include a simple variable-length code (LZB),
+dynamic Huffman coding (LZH), and Shannon-Fano coding (ZIP 1.x)), all
+of which result in a certain degree of improvement over the basic
+scheme.  This is because the output values from the first stage are
+not evenly distributed, i.e.  their probabilities are not equal and
+statistical compression can do its part.
+Lempel-Ziv 1978
++++++++++++++++
+     One large problem with the LZ77 method is that it does not use
+the coding space efficiently, i.e.  there are length and offset
+values that never get used.  If the history buffer contains multiple
+copies of a string, only the latest occurrance is needed, but they
+all take space in the offset value space.  Each duplicate string
+wastes one offset value.
+     To get higher efficiency, we have to create a real dictionary.
+Strings are added to the codebook only once.  There are no duplicates
+that waste bits just because they exist.  Also, each entry in the
+codebook will have a specific length, thus only an index to the
+codebook is needed to specify a string (phrase).  In LZ77 the length
+and offset values were handled more or less as disconnected variables
+although there is correlation.  Because they are now handled as one
+entity, we can expect to do a little better in that regard also.
+     LZ78-type compressors use this kind of a dictionary.  The next
+part of the message (the lookahead buffer contents) is searched from
+the dictionary and the maximum-length match is returned.  The output
+code is an index to the dictionary.  If there is no suitable entry in
+the dictionary, the next input symbol is sent as a literal symbol.
+The dictionary is updated after each symbol is encoded, so that it is
+possible to build an identical dictionary in the decompression code
+without sending additional data.
+     Essentially, strings that we have seen in the data are added to
+the dictionary.  To be able to constantly adapt to the message
+statistics, the dictionary must be trimmed down by discarding the
+oldest entries.  This also prevents the dictionary from becoming
+full, which would decrease the compression ratio.  This is handled
+automatically in LZ77 by its use of a history buffer (a sliding
+window).  For LZ78 it must be implemented separately.  Because the
+decompression code updates its dictionary in sychronization with the
+compressor the code remains uniquely decodable.
+Run-Length Encoding
++++++++++++++++++++
+     Run length encoding also belongs to this group.  If there are
+consecutive equal valued symbols in the input, the compressor outputs
+how many of them there are, and their value.  Again, we must be able
+to identify literal bytes and compressed data.  One of the RLE
+compressors I have seen outputs two equal symbols to indentify a run
+of symbols.  The next byte(s) then tell how many more of these to
+output.  If the value is 0, there are only two consecutive equal
+symbols in the original stream.  Depending on how many bits are used
+to represent the value, this is the only case when the output is
+expanded.
+     Run-length encoding has been used since day one in C64
+compression programs because it is very fast and very simple.  Part
+of this is because it deals with byte-aligned data and is essentially
+just copying bytes from one place to another.  The drawback is that
+RLE can only compress identical bytes into a shorter representation.
+On the C64, only graphics and music data contain large runs of
+identical bytes.  Program code rarely contains more than a couple of
+successive identical bytes.  We need something better.
+     That "something better" seems to be LZ77, which has been used in
+C64 compression programs for a long time.  LZ77 can take advantage of
+repeating code/graphic/music data fragments and thus achieves better
+compression.  The drawback is that practical LZ77 implementations
+tend to became variable-to-variable codes (more on that later) and
+need to handle data bit by bit, which is quite a lot slower than
+handling bytes.
+     LZ78 is not practical for C64, because the decompressor needs to
+create and update the dictionary.  A big enough dictionary would take
+too much memory and updating the dictionary would need its share of
+processor cycles.
+Variable-to-variable codes
+--------------------------
+     The compression algorithms in this category are mostly hybrids
+or concatenations of the previously described compressors.  For
+example a variable-to-block code such as LZ77 followed by a
+statistical compressor like Huffman encoding falls into this category
+and is used in Zip, LHa, Gzip and many more.  They use fixed, static,
+and adaptive statistical compression, depending on the program and
+the compression level selected.
+     Randomly concatenating algorithms rarely produces good results,
+so you have to know what you are doing and what kind of files you are
+compressing.  Whenever a novice asks the usual question:  'What
+compression program should I use?', they get the appropriate
+response:  'What kind of data you are compressing?'
+     Borrowed from Tom Lane's article in comp.compression:
+It's hardly ever worthwhile to take the compressed output of one
+compression method and shove it through another compression method.
+Especially not if the second method is a general-purpose compressor
+that doesn't have specific knowledge of the first compression step.
+Compression is effective in direct proportion to the extent that it
+eliminates obvious patterns in the data.  So if the first compression
+step is any good, it will leave little traction for the second step.
+Combining multiple compression methods is only helpful when the
+methods are specifically chosen to be complementary.
+     A small sidetrack I want to take:
+This also brings us conveniently to another truth in lossless
+compression.  There isn't a single compressor which would be able to
+losslessly compress all possible files (you can see the
+comp.compression FAQ for information about the counting proof).  It
+is our luck that we are not interested in compressing all files.  We
+are only interested in compressing a very small subset of all files.
+The more accurately we can describe the files we would encounter, the
+better.  This is called modelling, and it is what all compression
+programs do and must do to be successful.
+     Audio and graphics compression algorithm may assume a continuous
+signal, and a text compressor may assume that there are repeated
+strings in the data.  If the data does not match the assumptions (the
+model), the algorithm usually expands the data instead of compressing
+it.
+</code>
+===== Representing Integers =====
+<code>
+     Many compression algorithms use integer values for something or
+another.  Pucrunch is no exception as it needs to represent RLE
+repeat counts and LZ77 string match lengths and offsets.  Any
+algorithm that needs to represent integer values can benefit very
+much if we manage to reduce the number of bits needed to do that.
+This is why efficient coding of these integers is very important.
+What encoding method to select depends on the distribution and the
+range of the values.
+Fixed, Linear
+-------------
+     If the values are evenly distributed throughout the whole range,
+a direct binary representation is the optimal choice.  The number of
+bits needed of course depends on the range.  If the range is not a
+power of two, some tweaking can be done to the code to get nearer the
+theoretical optimum log2(_range_) bits per value.
+    Value   Binary  Adjusted 1&2
+    ---------------------------
+       000     00      000     H = 2.585
+       001     01      001     L = 2.666
+       010     100     010     (for flat distribution)
+       011     101     011
+       100     110     10
+       101     111     11
+     The previous table shows two different versions of how the
+adjustment could be done for a code that has to represent 6 different
+values with the minimum average number of bits.  As can be seen, they
+are still both prefix codes, i.e.  it's possible to (easily) decode
+them.
+     If there is no definite upper limit to the integer value, direct
+binary code can't be used and one of the following codes must be
+selected.
+Elias Gamma Code
+----------------
+     The Elias gamma code assumes that smaller integer values are
+more probable.  In fact it assumes (or benefits from) a
+proportionally decreasing distribution.  Values that use n bits
+should be twice as probable as values that use n+1 bits.
+     In this code the number of zero-bits before the first one-bit (a
+unary code) defines how many more bits to get.  The code may be
+considered a special fixed Huffman tree.  You can generate a Huffman
+tree from the assumed value distribution and you'll get a very
+similar code.  The code is also directly decodable without any tables
+or difficult operations, because once the first one-bit is found, the
+length of the code word is instantly known.  The bits following the
+zero bits (if any) are directly the encoded value.
+    Gamma Code   Integer  Bits
+    --------------------------
+                  1     1
+x              2-3     3
+xx            4-7     5
+xxx         8-15     7
+xxxx      16-31     9
+xxxxx    32-63    11
+    0000001xxxxxx 64-127    13
+    ...
+Elias Delta Code
+----------------
+     The Elias Delta Code is an extension of the gamma code.  This
+code assumes a little more 'traditional' value distribution.  The
+first part of the code is a gamma code, which tells how many more
+bits to get (one less than the gamma code value).
+    Delta Code   Integer  Bits
+    --------------------------
+                  1     1
+x             2-3     4
+xx            4-7     5
+xxx        8-15     8
+xxxx      16-31     9
+xxxxx     32-63    10
+xxxxxx   64-127    11
+    ...
+     The delta code is better than gamma code for big values, as it
+is asymptotically optimal (the expected codeword length approaches
+constant times entropy when entropy approaches infinity), which the
+gamma code is not.  What this means is that the extra bits needed to
+indicate where the code ends become smaller and smaller proportion of
+the total bits as we encode bigger and bigger numbers.  The gamma
+code is better for greatly skewed value distributions (a lot of small
+values).
+Fibonacci Code
+--------------
+     The fibonacci code is another variable length code where smaller
+integers get shorter codes.  The code ends with two one-bits, and the
+value is the sum of the corresponding Fibonacci values for the bits
+that are set (except the last one-bit, which ends the code).
+  2  3  5  8 13 21 34 55 89
+    ----------------------------
+(1)                         =  1
+  1 (1)                      =  2
+  0  1 (1)                   =  3
+  0  1 (1)                   =  4
+  0  0  1 (1)                =  5
+  0  0  1 (1)                =  6
+  1  0  1 (1)                =  7
+  0  0  0  1 (1)             =  8
+    :  :  :  :  :  :                 :
+  0  1  0  1 (1)             = 12
+  0  0  0  0  1 (1)          = 13
+    :  :  :  :  :  :  :              :
+  1  0  1  0  1 (1)          = 20
+  0  0  0  0  0  1 (1)       = 21
+    :  :  :  :  :  :  :  :           :
+  0  0  1  0  0  1 (1)       = 27
+     Note that because the code does not have two successive one-bits
+until the end mark, the code density may seem quite poor compared to
+the other codes, and it is, if most of the values are small (1-3).
+On the other hand, it also makes the code very robust by localizing
+and containing possible errors.  Although, if the Fibonacci code is
+used as a part of a larger system, this robustness may not help much,
+because we lose the synchronization in the upper level anyway.  Most
+adaptive methods can't recover from any errors, whether they are
+detected or not.  Even in LZ77 the errors can be inherited infinitely
+far into the future.
+Comparison between delta, gamma and Fibonacci code lengths
+----------------------------------------------------------
+           Gamma Delta Fibonacci
+     1     1       2.0
+-3     3     4       3.5
+-7     5     5       4.8
+-15     7     8       6.4
+-31     9     9       7.9
+-63    11    10       9.2
+-127    13    11      10.6
+     The comparison shows that if even half of the values are in the
+range 1..7 (and other values relatively near this range), the Elias
+gamma code wins by a handsome margin.
+Golomb and Rice Codes
+---------------------
+     Golomb (and Rice) codes are prefix codes that are suboptimal
+(compared to Huffman), but very easy to implement.  Golomb codes are
+distinguished from each other by a single parameter m.  This makes it
+very easy to adjust the code dynamically to adapt to changes in the
+values to encode.
+    Golomb  m=1     m=2     m=3     m=4     m=5     m=6
+    Rice    k=0     k=1             k=2
+    ---------------------------------------------------
+    n = 0   0       00      00      000     000     000
+   10      01      010     001     001     001
+   110     100     011     010     010     0100
+   1110    101     100     011     0110    0101
+   11110   1100    1010    1000    0111    0110
+   111110  1101    1011    1001    1000    0111
+   1111110 11100   1100    1010    1001    1000
+   :       11101   11010   1011    1010    1001
+   :       111100  11011   11000   10110   10100
+     To encode an integer n (starting from 0 this time, not from 1 as
+for Elias codes and Fibonacci code) using the Golomb code with
+parameter m, we first compute floor( n/m ) and output this using a
+unary code.  Then we compute the remainder n mod m and output that
+value using a binary code which is adjusted so that we sometimes use
+floor( log2(m) ) bits and sometimes ceil( log2(m) ) bits.
+     Rice coding is the same as Golomb coding except that only a
+subset of parameters can be used, namely the powers of 2.  In other
+words, a Rice code with the parameter k is equal to Golomb code with
+parameter m = 2^k.  Because of this the Rice codes are much more
+efficient to implement on a computer.  Division becomes a shift
+operation and modulo becomes a bit mask operation.
+Hybrid/Mixed Codes
+------------------
+     Sometimes it may be advantageous to use a code that combines two
+or more of these codes.  In a way the Elias codes are already hybrid
+codes.  The gamma code has a fixed huffman tree (a unary code) and a
+binary code part, the delta code has a gamma code and a binary code
+part.  The same applies to Golomb and Rice codes because they consist
+of a unary code part and a linear code (adjusted) part.
+     So now we have several alternatives to choose from.  We simply
+have to do a little real-life research to determine how the values we
+want to encode are distributed so that we can select the optimum code
+to represent them.
+     Of course we still have to keep in mind that we intend to decode
+the thing with a 1-MHz 8-bit processor.  As always, compromises loom
+on the horizon.  Pucrunch uses Elias Gamma Code, because it is the
+best alternative for that task and is very close to static Huffman
+code.  The best part is that the Gamma Code is much simpler to decode
+and doesn't need additional memory.
+</code>
+===== Closer Look =====
+<code>
+     Because the decompression routines are usually much easier to
+understand than the corresponding compression routines, I will
+primarily describe only them here.  This also ensures that there
+really _is_ a decompressor for a compression algorithm.  Many are
+those people who have developed a great new compression algorithm
+that outperforms all existing versions, only to later discover that
+their algorithm doesn't save enough information to be able to recover
+the original file from the compressed data.
+     Also, the added bonus is that once we have a decompressor, we
+can improve the compressor without changing the file format.  At
+least until we have some statistics to develop a better system.  Many
+lossy video and audio compression systems only document and
+standardize the decompressor and the file or stream format, making it
+possible to improve the encoding part of the process when faster and
+better hardware (or algorithms) become available.
+RLE
+---
+     void DecompressRLE() {
+         int oldChar = -1;
+         int newChar;
+         while(1) {
+             newChar = GetByte();
+             if(newChar == EOF)
+                 return;
+             PutByte(newChar);
+             if(newChar == oldChar) {
+                 int len = GetLength();
+                 while(len > 0) {
+                     PutByte(newChar);
+                     len = len - 1;
+                 }
+             }
+             oldChar = newChar;
+         }
+     }
+     This RLE algorithm uses two successive equal characters to mark
+a run of bytes.  I have in purpose left open the question of how the
+length is encoded (1 or 2 bytes or variable-length code).  The
+decompressor also allows chaining/extension of RLE runs, for example
+'a', 'a', 255, 'a', 255 would output 513 'a'-characters.
+In this case the compression algorithm is almost as simple.
+     void CompressRLE() {
+         int oldChar = -1;
+         int newChar;
+         while(1) {
+             newChar = GetByte();
+             if(newChar==oldChar) {
+                 int length = 0;
+                 if(newChar == EOF)
+                     return;
+                 PutByte(newChar); /* RLE indicator */
+                 /* Get all equal characters */
+                 while((newChar = GetByte()) == oldChar) {
+                     length++;
+                 }
+                 PutLength(length);
+             }
+             if(newChar == EOF)
+                 return;
+             PutByte(newChar);
+             oldChar = newChar;
+         }
+     }
+     If there are two equal bytes, the compression algorithm reads
+more bytes until it gets a different byte.  If there was only two
+equal bytes, the length value will be zero and the compression
+algorithm expands the data.  A C64-related example would be the
+compression of the BASIC ROM with this RLE algorithm.  Or actually
+expansion, as the new file size is 8200 bytes instead of the original
+bytes.  Those equal byte runs that the algorithm needs just
+aren't there.  For comparison, pucrunch manages to compress the BASIC
+ROM into 7288 bytes, the decompression code included.  Even Huffman
+coding manages to compress it into 7684 bytes.
+    "BAAAAAADBBABBBBBAAADABCD"              total: 24*8=192 bits
+    "BAA",4,"DBB",0,"ABB",3,"AA",1,"DABCD"  total: 16*8+4*8=160 bits
+     This is an example of how the presented RLE encoder would work
+on a string.  The total length calculations assume that we are
+handling 8-bit data, although only values from 'A' to 'D' are present
+in the string.  After seeing two equal characters the decoder gets a
+repeat count and then adds that many more of them.  Notice that the
+repeat count is zero if there are only two equal characters.
+Huffman Code
+------------
+     int GetHuffman() {
+         int index = 0;
+         while(1) {
+             if(GetBit() == 1) {
+                 index = LeftNode(index);
+             } else {
+                 index = RightNode(index);
+             }
+             if(LeafNode(index)) {
+                 return LeafValue(index);
+             }
+         }
+     }
+     My pseudo code of the Huffman decode function is a very
+simplified one, so I should probably describe how the Huffman code
+and the corresponding binary tree is constructed first.
+     First we need the statistics for all the symbols occurring in
+the message, i.e.  the file we are compressing.  Then we rank them in
+decreasing probability order.  Then we combine the smallest two
+probabilities and assign 0 and 1 to the binary tree branches, i.e.
+the original symbols.  We do this until there is only one composite
+symbol left.
+     Depending on where we insert the composite symbols we get
+different Huffman trees.  The average code length is equal in both
+cases (and so is the compression ratio), but the length of the
+longest code changes.  The implementation of the decoder is usually
+more efficient if we keep the longest code as short as possible.
+This is achieved by inserting the composite symbols (new nodes)
+before all symbols/nodes that have equal probability.
+    "BAAAAAADBBABBBBBAAADABCD"
+    A (11)  B (9)  D (3)  C (1)
+    Step 1              Step 2              Step 3
+    'A' 0.458           'A' 0.458           C2  0.542 0\ C3
+    'B' 0.375           'B' 0.375 0\ C2     'A' 0.458 1/
+    'D' 0.125 0\ C1     C1  0.167 1/
+    'C' 0.042 1/
+             C3
+/  \ 1
+           /   'A'
+          C2
+/  \ 1
+       'B'   \
+             C1
+/  \ 1
+          'D'  'C'
+     So, in each step we combine two lowest-probability nodes or
+leaves into a new node.  When we are done, we have a Huffman tree
+containing all the original symbols.  The Huffman codes for the
+symbols can now be gotten by starting at the root of the tree and
+collecting the 0/1-bits on the way to the desired leaf (symbol).  We
+get:
+    'A' = 1    'B' = 00    'C' = 011    'D' = 010
+     These codes (or the binary tree) are used when encoding the
+file, but the decoder also needs this information.  Sending the
+binary tree or the codes would take a lot of bytes, thus taking away
+all or most of the compression.  The amount of data needed to
+transfer the tree can be greatly reduced by sending just the symbols
+and their code lengths.  If the tree is traversed in a canonical
+(predefined) order, this is all that is needed to recreate the tree
+and the Huffman codes.  By doing a 0-branch-first traverse we get:
+    Symbol  Code    Code Length
+    'B'     00      2
+    'D'     010     3
+    'C'     011     3
+    'A'     1       1
+     So we can just send 'B', 2, 'D', 3, 'C', 3, 'A', 1 and the
+decoder has enough information (when it also knows how we went
+through the tree) to recreate the Huffman codes and the tree.
+Actually you can even drop the symbol values if you handle things a
+bit differently (see the Deflate specification in RFC1951), but my
+arrangement makes the algorithm much simpler and doesn't need to
+transfer data for symbols that are not present in the message.
+     Basically we start with a code value of all zeros and the
+appropriate length for the first symbol.  For other symbols we first
+add the code value with 1 and then shift the value left or right to
+get it to be the right size.  In the example we first assign 00 to
+'B', then add one to get 01, shift left to get a 3-bit codeword for
+'D' making it 010 like it should.  For 'C' add 1, you get 011, no
+shift because the codewords is the right size already.  And for 'A'
+add one and get 100, shift 2 places to right and get 1.
+     The Deflate algorithm in essence attaches a counting sort
+algorithm to this algorithm, feeding in the symbols in increasing
+code length order.  Oh, don't worry if you don't understand what the
+counting sort has to do with this.  I just wanted to give you some
+idea about it if you some day read the deflate specification or the
+gzip source code.
+     Actually, the decoder doesn't necessarily need to know the
+Huffman codes at all, as long as it has created the proper internal
+representation of the Huffman tree.  I developed a special table
+format which I used in the C64 Huffman decode function and may
+present it in a separate article someday.  The decoding works by just
+going through the tree by following the instructions given by the
+input bits as shown in the example Huffman decode code.  Each bit in
+the input makes us go to either the 0-branch or the 1-branch.  If the
+branch is a leaf node, we have decoded a symbol and just output it,
+return to the root node and repeat the procedure.
+     A technique related to Huffman coding is Shannon-Fano coding.
+It works by first dividing the symbols into two equal-probability
+groups (or as close to as possible).  These groups are then further
+divided until there is only one symbol in each group left.  The
+algorithm used to create the Huffman codes is bottom-up, while the
+Shannon-Fano codes are created top-down.  Huffman encoding always
+generates optimal codes (in the entropy sense), Shannon-Fano
+sometimes uses a few more bits.
+     There are also ways of modifying the statistical compression
+methods so that we get nearer to the entropy.  In the case of 'A'
+having the probability 0.75 and 'B' 0.25 we can decide to group
+several symbols together, producing a variable-to-variable code.
+    "AA"    0.5625          0
+    "B"     0.25            10
+    "AB"    0.1875          11
+     If we separately transmit the length of the file, we get the
+above probabilities.  If a file has only one 'A', it can be encoded
+as length=1 and either "AA" or "AB".  The entropy of the source is
+H = 0.8113, and the average code length (per source symbol) is
+approximately L = 0.8518, which is much better than L = 1.0, which we
+would get if we used a code {'A','B'} = {0,1}.  Unfortunately this
+method also expands the number of symbols we have to handle, because
+each possible source symbol combination is handled as a separate
+symbol.
+Arithmetic Coding
+-----------------
+     Huffman and Shannon-Fano codes are only optimal if the
+probabilities of the symbols are negative powers of two.  This is
+because all prefix codes work in the bit level.  Decisions between
+tree branches always take one bit, whether the probabilities for the
+branches are 0.5/0.5 or 0.9/0.1.  In the latter case it would
+theoretically take only 0.15 bits (-log2(0.9)) to select the first
+branch and 3.32 bits (-log2(0.1)) to select the second branch, making
+the average code length 0.467 bits (0.9*0.15 + 0.1*3.32).  The
+Huffman code still needs one bit for each decision.
+     Arithmetic coding does not have this restriction.  It works by
+representing the file by an interval of real numbers between 0 and 1.
+When the file size increases, the interval needed to represent it
+becomes smaller, and the number of bits needed to specify that
+interval increases.  Successive symbols in the message reduce this
+interval in accordance with the probability of that symbol.  The more
+likely symbols reduce the range by less, and thus add fewer bits to
+the message.
+                                             Codewords
+    +-----------+-----------+-----------+
+    |           |8/9 YY     |  Detail   |<- 31/32    .11111
+    |           +-----------+-----------+<- 15/16    .1111
+    |    Y      |           | too small |<- 14/16    .1110
+    |2/3        |    YX     | for text  |<- 6/8      .110
+    +-----------+-----------+-----------+
+    |           |           |16/27 XYY  |<- 10/16    .1010
+    |           |           +-----------+
+    |           |    XY     |           |
+    |           |           |   XYX     |<- 4/8      .100
+    |           |4/9        |           |
+    |           +-----------+-----------+
+    |           |           |           |
+    |    X      |           |   XXY     |<- 3/8      .011
+    |           |           |8/27       |
+    |           |           +-----------+
+    |           |    XX     |           |
+    |           |           |           |<- 1/4      .01
+    |           |           |   XXX     |
+    |           |           |           |
+    |0          |           |           |
+    +-----------+-----------+-----------+
+     As an example of arithmetic coding, lets consider the example of
+two symbols X and Y, of probabilities 2/3 and 1/3.  To encode a
+message, we examine the first symbol:  If it is a X, we choose the
+lower partition; if it is a Y, we choose the upper partition.
+Continuing in this manner for three symbols, we get the codewords
+shown to the right of the diagram above.  They can be found by simply
+taking an appropriate location in the interval for that particular
+set of symbols and turning it into a binary fraction.  In practice,
+it is also necessary to add a special end-of-data symbol, which is
+not represented in this simple example.
+     This explanation may not be enough to help you understand
+arithmetic coding.  There are a lot of good articles about arithmetic
+compression in the net, for example by Mark Nelson.
+     Arithmetic coding is not practical for C64 for many reasons.
+The biggest reason being speed, especially for adaptive arithmetic
+coding.  The close second reason is of course memory.
+Symbol Ranking
+--------------
+     Symbol ranking is comparable to Huffman coding with a fixed
+Huffman tree.  The compression ratio is not very impressive (reaches
+Huffman code only is some cases), but the decoding algorithm is very
+simple, does not need as much memory as Huffman and is also faster.
+     int GetByte() {
+         int index = GetUnaryCode();
+         return mappingTable[index];
+     }
+     The main idea is to have a table containing the symbols in
+descending probability order (rank order).  The message is then
+represented by the table indices.  The index values are in turn
+represented by a variable-length integer representation (these are
+studied in the next article).  Because more probable symbols (smaller
+indices) take less bits than less probable symbols, in average we
+save bits.  Note that we have to send the rank order, i.e.  the
+symbol table too.
+    "BAAAAAADBBABBBBBAAADABCD"              total: 24*8=192 bits
+    Rank Order: A (11)  B (9)  D (3)  C (1)         4*8=32 bits
+    Unary Code: 0       10     110    1110
+    "100000001101010010101010100001100101110110"   42 bits
+                                             total: 74 bits
+     The statistics rank the symbols in the order ABDC (most probable
+first), which takes approximately 32 bits to transmit (we assume that
+any 8-bit value is possible).  The indices are represented as a code
+{0, 1, 2, 3} = {0, 10, 110, 1110}.  This is a simple unary code where
+the number of 1-bits before the first 0-bit directly give the integer
+value.  The first 0-bit also ends a symbol.  When this code and the
+rank order table are combined in the decoder, we get the reverse code
+{0, 10, 110, 1110} = {A, B, D, C}.  Note that in this case the code
+is very similar to the Huffman code we created in a previous example.
+LZ78
+----
+     LZ78-based schemes work by entering phrases into a dictionary
+and then, when a repeat occurrence of that particular phrase is
+found, outputting the dictionary index instead of the phrase.  For
+example, LZW (Lempel-Ziv-Welch) uses a dictionary with 4096 entries.
+In the beginning the entries 0-255 refer to individual bytes, and the
+rest 256-4095 refer to longer strings.  Each time a new code is
+generated it means a new string has been selected from the input
+stream.  New strings that are added to the dictionary are created by
+appending the current character K to the end of an existing string w.
+The algorithm for LZW compression is as follows:
+    set w = NIL
+    loop
+        read a character K
+        if wK exists in the dictionary
+            w = wK
+        else
+            output the code for w
+            add wK to the string table
+            w = K
+    endloop
+    Input string: /WED/WE/WEE/WEB
+    Input   Output  New code and string
+    /W      /       256 = /W
+    E       W       257 = WE
+    D       E       258 = ED
+    /       D       259 = D/
+    WE      256     260 = /WE
+    /       E       261 = E/
+    WEE     260     262 = /WEE
+    /W      261     263 = E/W
+    EB      257     264 = WEB
+            B
+     A sample run of LZW over a (highly redundant) input string can
+be seen in the diagram above.  The strings are built up
+character-by-character starting with a code value of 256.  LZW
+decompression takes the stream of codes and uses it to exactly
+recreate the original input data.  Just like the compression
+algorithm, the decompressor adds a new string to the dictionary each
+time it reads in a new code.  All it needs to do in addition is to
+translate each incoming code into a string and send it to the output.
+A sample run of the LZW decompressor is shown in below.
+    Input code: /WEDEB
+    Input   Output  New code and string
+    /       /
+    W       W       256 = /W
+    E       E       257 = WE
+    D       D       258 = ED
+     /W      259 = D/
+    E       E       260 = /WE
+     /WE     261 = E/
+     E/      262 = /WEE
+     WE      263 = E/W
+    B       B       264 = WEB
+     The most remarkable feature of this type of compression is that
+the entire dictionary has been transmitted to the decoder without
+actually explicitly transmitting the dictionary.  The decoder builds
+the dictionary as part of the decoding process.
+     See also the article "LZW Compression" by Bill Lucier in
+C=Hacking issue 6 and "LZW Data Compression" by Mark Nelson mentioned
+in the references section.
+</code>
+===== Conclusions =====
+<code>
+     That's more than enough for one article.  What did we get out of
+it ?  Statistical compression works with uneven symbol probabilities
+to reduce the average code length.  Substitutional compressors
+replace strings with shorter representations.  All popular
+compression algorithms use LZ77 or LZ78 followed by some sort of
+statistical compression.  And you can't just mix and match different
+algorithms and expect good results.
+     There are no shortcuts in understanding data compression.  Some
+things you only understand when trying out them yourself.  However, I
+hope that this article has given you at least a vague grasp of how
+different compression methods really work.
+     I would like to send special thanks to Stephen Judd for his
+comments.  Without him this article would've been much more
+unreadable than it is now.  On the other hand, that's what the
+magazine editor is for :-)
+     The second part of the story is a detailed talk about pucrunch.
+I also go through the corresponding C64 decompression code in detail.
+If you are impatient and can't wait for the next issue, you can take
+a peek into http://www.cs.tut.fi/%7Ealbert/Dev/pucrunch/ for a preview.
+</code>
+===== References =====
+<code>
+  * The comp.compression FAQ
+    http://www.cis.ohio-state.edu/hypertext/faq/usenet/
+                                             compression-faq/top.html
+  * A Data Compression Review
+    http://www.ics.uci.edu/%7Edan/pubs/DataCompression.html
+  * Data Compression Class
+    http://www.cs.unt.edu/home/srt/5330/
+  * Mark Nelson's Homepage
+    http://web2.airmail.net/markn/
+       + LZW Data Compression
+         http://web2.airmail.net/markn/articles/lzw/lzw.htm
+       + Arithmetic Coding Article
+         http://web2.airmail.net/markn/articles/arith/part1.htm
+       + Data Compression with the Burrows-Wheeler Transform
+         http://web2.airmail.net/markn/articles/bwt/bwt.htm
+  * Charles Bloom's Page
+    http://wwwvms.utexas.edu/%7Ecbloom/
+  * The Redundancy of the Ziv-Lempel Algorithm for Memoryless Sources
+    http://ei0.ei.ele.tue.nl/%7Etjalling/zivlem/zivlem.html
+  * Ross Williams' Compression Pages
+    http://www.ross.net/home/
+  * DEFLATE Compressed Data Format Specification version 1.3
+    http://www.funet.fi/pub/doc/rfc/rfc1951.txt
+  * Markus F.X.J. Oberhumer's Compression Links
+    http://wildsau.idv.uni-linz.ac.at/mfx/compress.html
+  * The Lossless Compression (Squeeze) Page
+    http://www.cs.sfu.ca/CC/365/li/squeeze/
+........
+....
+..
+.                                     C=H
+::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
+</code>
+====== 3D Graphics for the Masses: lib3d and Cool World ======
+<code>
+by Stephen L. Judd
+sjudd@nwu.edu
+	Well folks, it's time once again for some 3D graphics.  This
+will, I think, be more or less the final word on the subject, at least
+from me!  I'm pooped, and I think these routines pretty much push
+these algorithms as far as they're going to go.  And there are so
+many other interesting things to move on to!
+	These routines, not to mention this article, are the codification
+of all those years of algorithms and derivations and everything else.
+Those past efforts created working prototypes; this one is the
+production model.  Nate Dannenberg suggested that this be done, and
+who am I to disobey?  Besides, after a break of a few years it's going
+to be fun to rederive all the equations and algorithms from scratch,
+and do a "for-real" implementation, improving the good ideas and
+fixing the bad.
+	Right?  Right.
+	So that's what I did, and that's what this article does.  The
+first section of the article summarizes the basics of 3D graphics:
+projections and rotations.  Since in my experience many people don't
+remember their high school math, I've also covered the basic mathematical
+tools needed, like trigonometry and linear algebra, and other related
+issues.  The next section covers constructing a 3D world: representing
+objects, navigating through the world, things like that.  The third
+section covers the main library routines and their implementation,
+including lots of code disassembly.  The fourth section covers Cool World,
+a program which is a demonstration of the 3D library.  The final section
+just summarizes the 3d library routines, calling conventions, parameters
+used, things like that.  Since this article is enormous, I have left
+out the complete source code; the truly motivated can visit
+	http://stratus.esam.nwu.edu/~judd/fridge/
+to check out the full source for both Cool World and the 3d library.  The
+binaries are also present there.
+	By this time you may have asked, "What exactly _is_ this
+d library?"  Glad you asked.  It is a set of routines that are
+important for doing 3D graphics: rotations, projections, drawing
+polygons, that sort of thing.  The routines are very flexible, compact,
+and extremely fast.  They can draw to any VIC bank, and still leave
+enough room for eight sprite definitions.  What the routines are
+not is a "3D world construction kit".  A program has to be written
+which utilizes the routines.
+	Which leads to Cool World.  CW is, simply, a program which
+demonstrates using the 3D library.  It is very large.  It has some
+seventeen objects in it -- the initial tetrahedron is the 'center';
+straight ahead of it is squaresville; to the left is a bunch of stars;
+straight down is the Cobra Mk III; straight back is a line of
+tetrahedrons.  It also has some other fun stuff like the starfield
+and a tune.  It doesn't have Kim Basinger, unfortunately.
+	Oh yeah: the _whole point_ of the 3d library is for you to
+use it in your own programs!  You can sell those programs if you like,
+too.  Doesn't bother me; in fact I would be quite flattered to see
+it used in some cool and popular program.  Why re-invent the wheel
+when the routines are already written?  Use the 3d library.  Live
+the 3d library.  Do something interesting with it.  If nobody uses
+it, it might as well have never been written, right?  Right.  You
+know you want to use it.  So use it!
+	But before you use it, you better understand it.  So let's
+start at the very beginning (a very good place to start).
+</code>
+===== Section 1: 3D Basics and review =====
+<code>
+	This is just going to be a quick summary.  For more detail
+you ought to go back to some of the earlier articles.
+	First we need to decide on a coordinate system.  A convenient
+system for dealing with the computer is to have the x-axis point to
+the right and the y-axis to point down the screen -- the usual
+coordinate system with x=0 y=0 being in the upper-left corner of
+the screen.  The z-axis can then point into the monitor (i.e. z=20
+is behind the monitor screen somewhere, and z=-20 is somewhere behind
+where you're sitting); this keeps the coordinate system right-handed.
+	The next thing is to figure out an equation for projecting
+from three dimensions into two dimensions.  In other words: how to
+paint a picture.
+	One of the great breakthroughs for art at the beginning of
+the Renaissance was the understanding of perspecitve.  The first
+thing to notice is that far-away objects look smaller.  The next
+thing to observe is that straight lines seem to meet in the middle --
+like looking down a long road or sidewalk or railroad tracks.  That's
+perspective: far-off objects get smaller _towards the center of vision_.
+	Well that's an easy equation: just take the coordinate, and
+divide by the distance away from us.  If we're looking down the z-axis,
+	x' = x/z	y' = y/z
+gives us a pair of projected points.  Objects which are far away have
+a large value for z, which means x' and y' will be proportionally
+smaller.  For really large z, they go to zero -- they go to the center
+of our vision.
+	Now, how can we make objects appear larger?  One way is to
+stand closer to them -- that changes the perspective.  Another way
+is to magnify them, with a lens.  This is how a telescope works, and
+also how the old Mark-I eyeball works as well.  Whereas perspective
+makes far-off points behave differently than near ones, magnification
+just expands all points equally.  And that's exactly how to put it in
+the equation:
+	x' = d * x/z	y' = d * y/z
+where d is some constant magnification factor (a number, like 12).
+And the above are the projection equations.
+	A very important skill is the ability to 'read' an equation.
+The above equation takes a point, divides by the distance away from us,
+and then multiplies by the magnification constant.  It says that
+far-off objects will be small, and that all objects are magnified
+equally.  It also implies that the z-coordinates better all be positive
+or all negative.  If an object has some positive z-coordinates and
+some negative, that means that some of its points are in front of us
+and some are behind us.  In other words, that we are standing inside
+the object!
+	A proper derivation is as follows: consider a pinhole camera.
+A light ray bounces off some object at a point (x,y,z) and passes
+through the pinhole, located at the origin (0,0,0).  We then place
+a photographic plate parallel to the x-y plane, at z'=d, and figure
+out where the light ray hits the plate.  The equation of the light
+ray is the line
+	(x', y', z') = t*(x,y,z)
+and it hits the plate when z'=d, which happens when
+	tz = d
+so when t = d/z.  Thus the coordinates on the plate (the film) are
+	x' = d/z * x	y' = d/z * y
+which gives the earlier equations.  Moving the film up and down will
+magnify the image, corresponding to changing the value of d.  If d
+is negative, the image will be inverted.  Positive d corresponds to
+having the film between the pinhole lens and the object, but
+mathematically it gives the same answer without inverting the object
+(since x' and y' won't have a minus sign in front of them).
+Representing 3D objects
+-----------------------
+	The important thing to recognize from the above is that
+straight lines in 3D will still be straight lines in 2D -- the
+projection doesn't turn straight lines into curves or anything
+like that.
+	So, to project a straight line from 3D into 2D all that
+is needed are the _endpoints_ of the line.  Project those two
+endpoints from 3D into 2D, and then all that is needed is to draw
+a line between the two _projected_ points.
+	Most of the objects we will be dealing with will be made
+up of a series of flat plates, because these are easier to do
+calculations with (ever noticed the main difference between the
+stealth fighter and the stealth bomber?).  In other words: polygons.
+	These polygons are just flat objects with several straight
+sides.  All we need are the vertices of each polygon (the endpoints
+of each of the lines); from those we can reconstruct all the
+in-between points.
+	A very simple example is a cube.  It has six faces.  The
+eight corners of a cube might be located at (+/-1, +/-1, +/-1).
+Project those eight points and connect the projected points in the
+right way, and Viola!  Violin!  3D graphics.
+	The next thing we will want to do is rotate an object.  For
+this we need a little trig, and it's very helpful to know a little
+about linear algebra and vectors.  What's that?  You don't remember
+this stuff?  For shame!  But we can fix that up in no time.
+Trig review in two paragraphs.
+-----------
+	Take a look at a triangle sometime.  No matter how large or
+small you draw a particular triangle, it still looks the same
+because all of the sides and angles are in proportion to one another.
+Now fix one of the angles at 90 degrees, to form a right triangle.
+The minute you choose one of the other angles, the opposite angle
+is determined (since the angles have to add up to 180 degrees).
+And when you know all the angles you know the _ratios_ of the sides
+to each other -- you don't know the actual length of the sides,
+but you know how the triangle looks and hence its proportions.
+	And that is pretty much all of trig.  Draw a right triangle,
+and pick an angle (call that angle theta).  The hypoteneuse is the
+longest side of a triangle, and hence the side opposite the 90 degree
+angle.  The 'adjacent' side is the side which touches the angle
+theta; the 'opposite' side is the side opposite theta (imagine that!).
+We _define_ sine, cosine, and tangent as
+	cos(theta) = adjacent/hypoteneuse
+	sin(theta) = opposite/hypoteneuse
+	tan(theta) = sin(theta)/cos(theta) = opposite/adjacent
+and that's all of trig, right there.  Everything else comes from
+those three definitions.  If you can't remember them, just remember
+the famous Indian Chief SOHCAHTOA, who fought so bravely for the
+cause of his people and mathematicians everywhere (SOH: sin-opposite-hyp;
+CAH: cos-adjacent-hyp; TOA: tan-opposite-adjacent).
+Polar coordinates
+-----------------
+	Now we've got some tools to do some useful calculations with,
+once we remember the Pythagorean theorem,
+	a^2 + b^2 = c^2,
+where c = the hypoteneuse and a,b are the sides of a right triangle.
+Before doing rotations it is helpful to understand polar coordinates.
+	Draw a set of normal coordinate axis on a piece of paper, and
+draw a point somewhere (at 3,2 say).  Label this point x,y and
+draw a line from the origin (0,0) to that point, and call its
+length "r".  Label the angle between that line and the positive
+x-axis as "theta", and draw a line straight down from x,y to the
+x-axis.  Whaddaya know?  It's a right triangle.  The length of one
+side is x, the length of the other side is y, and the length of the
+hypoteneuse is r.  And the trig definitions tell us that
+	cos(theta) = x/r
+	sin(theta) = y/r
+	tan(theta) = y/x.
+So now we can define sin, cos, and tan for all angles theta just by
+drawing a little picture.
+	Two coordinates, x and y, are used to locate the point.
+Another two coordinates in the picture are "r" and "theta", and
+they are just as good for locating where a point is.  In fact,
+the above equations give the x and y coordinates of a point
+(r,theta):
+	x = r * cos(theta)	y = r * sin(theta)
+They also give us the r and theta coordinates of a point (x,y):
+	tan(theta) = y/x
+	r^2 = x^2 + y^2
+You may also have noticed that the trig formula
+	cos(theta)^2 + sin(theta)^2 = 1
+follows from the above trig definitions, since x^2 + y^2 = r^2.
+	In the Cartesian (or rectangular) system there are two
+coordinates, x and y.  The equation x=constant is a vertical line
+and the equation y=constant is a horizontal line, so this is a
+system of straight lines which cross each other at right angles.
+In polar coordinates the equation theta=constant is a radial line,
+and r=constant gives a circle (constant radius, you know).  So
+polar coordinates consists of circles which intersect radial lines
+(note that they also intersect at right angles).  And the equations
+above tell how to convert between the two systems.
+	There are many other coordinate systems, btw.  There are
+elliptical coordinates and parabolic coordinates and other strange
+systems.  Why bother?  For one thing, if you're solving an equation
+on a round plate it's awfully convenient to use polar coordinates.
+For another, we can now do some rotations.
+	(And for another, you can draw cool graphs like r=cos(3*theta)).
+Rotations
+---------
+	Start with a point (r,theta) and rotate it to a new point
+(r,theta+s) i.e. rotate by an amount s.  The original coordinate was
+	x = r*cos(theta)	y = r*sin(theta)
+and the new coordinate is
+	x' = r*cos(theta+s)	y = r*sin(theta+s)
+so by using the trig identities for cos(a+b) and sin(a+b) we
+arrive at the simple expressions
+	x' = x*cos(s) - y*sin(s)
+	y' = x*sin(s) + y*cos(s)
+for the rotated coordinates x' and y' in terms of the original
+coordinates x and y.  The above is a two-dimensional rotation.
+Three-dimensional rotation is done with a series of two-dimensional
+rotations.  The above rotates x and y, but keeps z fixed (I don't
+see a z anywhere in there, at least).  In other words, it rotates
+about the z-axis.  A rotation about the x-axis looks like
+	x' = x*cos(s) - z*sin(s)
+	z' = x*sin(s) + z*cos(s)
+and a rotation about the y-axis looks like
+	x' = x*cos(s)  + z*sin(s)
+	z' = -x*sin(s) + z*cos(s)
+(the opposite minus sign just comes from the orientation of the y-axis).
+	Rotations in three dimensions do NOT commute.  Just by trying
+with a book it is very easy to see that a rotation of 45 degrees about
+the x-axis and then 45-degrees about the y-axis gives a very different
+result from first rotating about the y-axis and then the x-axis.
+	To really make rotations powerful we need a little linear
+algebra.  But first...
+Radians ("God's units")
+-------
+	Just for completeness... what is an angle, really?  It's not
+a length, it's not a direction... what is it?  And why are there
+"angles" in a circle?
+	The answer to the second is "because the ancient Babylonians
+used base 60 for all their calculations."  In other words, degrees
+are totally arbitrary.  There could just as easily be 1000 degrees
+in a circle.  So, what is an angle?
+	Think about a circle.  It has two lengths associated with it:
+the length "across" (the diameter D) and the length "around" (the
+circumference C).  And, like triangles, no matter how large or small
+the circle is drawn, those lengths stay in proportion.  That is,
+	C/D = constant
+That constant is, of course, pi = 3.1415926...  This then gives
+the equation for the circumference of a circle as C = pi*D = 2*pi*r
+where r=radius.  But what is an angle?
+	Draw two radii in the circle.  We _define_ the angle between
+those two radii as the length of the subtended circumference divided
+by the radius:
+	angle = length / radius.
+So again, no matter what the _size_ of the circle is the _angle_ stays
+the same.  The length is just a fraction of the circumference, which
+is 2*pi*radius; this means that the angle is just a fraction of 2*pi.
+	These are radians.  There are 2*pi of them in a circle.
+A quarter of a circle (90 degrees) is just 1/4 (2*pi) = pi/2 radians.
+An eighth is pi/4 radians.  And so on.  These are of course very
+natural units for an angle.  The above definition has angle equal
+to a length divided by a length; radians have no dimension (whereas
+degrees are a dimension, just like feet or pounds or seconds are).
+	Degrees are useful for talking and writing, but radians
+are what is needed for calculations.
+Vectors and Linear Algebra
+--------------------------
+	A vector (in three dimensions) is simply a line drawn from the
+origin (0,0,0) to some point (x,y,z).  Since they always emanate from
+the origin, it is correct to refer to "the vector (x,y,z)".  The important
+thing is that it has both length _and_ direction.  A physical example
+is the difference between velocity and speed.  Speed is just a
+quantity: for example, "120 Miles per hour".  Velocity, on the other
+hand, has _direction_ -- you might be going straight up, or straight
+down, or turning around a curve, etc. and the _length_ of the velocity
+vector gives the speed: 120 MPH.
+	But all we need to worry about here is the geometric meaning,
+and the difference between the _point_ (Px,Py,Pz) and the _vector_
+(Px, Py, Pz).  The point P is just a point, but geometrically the
+vector P is a line extending *from* the origin *to* the point P --
+it has a length, and a direction: it points in the direction of
+(Px, Py, Pz).  We will be using vectors for rotations, and to figure
+out what direction something points, and all sorts of other stuff.
+	The dimension of a vector is the number of elements in that
+vector.  Let P be a vector.  A 2D vector might be P=(Px,Py).  A three
+dimensional vector example is P=(Px,Py,Pz).  P=(p1,p2,p3,p4,p5,p6)
+would be a six-dimensional vector.  So an n-dimensional vector is just
+a list of n independent quantities.  We'll just be dealing with
+two and three dimensional vectors here, though, so when you see
+a sentence like "The vector v1" just think of (v1x, v1y, v1z).
+	The length of a vector is again given by Pythagorus:
+	r^2 = Px^2 + Py^2 + Pz^2 + ...
+i.e. the sum of the squares of all the elements.  This is a good
+calculation to avoid in algorithms, since it is expensive, but
+it is useful to know.
+	The simplest thing one can do with a vector is change its
+length, by multiplying by a constant:
+	c*(Px,Py,Pz) = (c*Px, c*Py, c*Pz).
+Multiplying by a constant multiplies all elements in the vector by
+that constant.  Just like with triangles all lengths increase
+proportionally, so it is a vector which points in the same direction
+but has a different length.
+	Two vector operations that are very useful are the "dot product"
+and the "cross product".  I'll write the dot product of two vectors
+R and P as either R.P or <R,P>, and it is defined as
+	R . P = |R| |P| cos(theta)
+that is, the length of R times the length of P times the cosine of the
+angle between the two vectors.  From this it is easy to show that
+the dot product may also be written as
+	R . P = Rx*Px + Ry*Py + ...
+that is, multiply the individual components together and add them up.
+Note that the dot product gives a _number_ (a scalar), NOT a vector.
+Note also that the dot product between two vectors separated by an
+angle greater than pi/2 will be negative, from the first equation,
+and that the dot product of two perpendicular vectors is zero.
+The dot product is sometimes referred to as the inner (or scalar)
+product.
+	The cross-product (sometimes called the vector product or
+skew product) is denoted by RxP and is given by
+	R x P = (Ry*Pz-Rz*Py, Rz*Px-Rx*Pz, Rx*Py-Ry*Px)
+As you can see, the result is a _vector_.  In fact, this vector
+is perpendicular to both R and P -- it is perpendicular to the plane
+that R and P lie in.  Its length is given by
+	length = |R| |P| sin(theta)
+Note that P x R = - R x P; the direction of the resulting vector is
+usually determined by the "right hand rule".  All that is important
+here is to remember that the cross-product generates perpendicular
+vectors.  We won't be using any cross products in this article.
+	There are lots of other things we can do to vectors.  One
+of the most important is to multiply by a _matrix_.  A matrix is
+like a bunch of vectors grouped together, so it has rows and columns.
+An example of a 2x2 matrix is
+	[a b]
+	[c d]
+an example of a 3x3 matrix is
+	[a b c]
+	[d e f]
+	[g h i]
+and so on.  The number of rows doesn't have to equal the number of
+columns.  In fact, an n-dimensional vector is just an n x 1 (read
+"n by 1") matrix: n rows, but one column.
+	We add matrices together by adding the individual elements:
+	[a1 a2 a3]   [b1 b2 b3]   [a1+b1 a2+b2 a3+b3]
+	[a4 a5 a6] + [b4 b5 b6] = [a4+b4 a5+b5 a6+b6]
+	[a7 a8 a9]   [b7 b8 b9]   [a7+b7 a8+b8 a9+b9]
+We can multiply by a constant, which just multiplies all elements
+by that constant (just like the vector).
+	Matrices can also be multiplied.  The usual rule is "row times
+column".  That is, given two matrices A and B, you take rows of
+A and dot them with columns of B:
+	[A1 A2] [B1 B2] = [A1*B1+A2*B3  A1*B2+A2*B4]
+	[A3 A4] [B3 B4]   [A3*B1+A4*B3  A3*B2+A4*B4]
+In the above, the (1,1) element is the first row of A (A1 A2) times
+the first column of B (B1 B3) to get A1*B1+A2*B3.  And so on.  (With
+a little practice this becomes very easy).  We will be multiplying
+rotation matrices together, and multiplying matrices times vectors.
+	Although "row times column" is the usual way that this is
+taught, it can also be looked at as "columns times elements".  The
+easiest example is to multiply a matrix A times a vector x.  The first
+method gives:
+		[a1 a2 a3]	     [ <row1,x> ]   [a1*x1 + a2*x2 + a3*x3]
+	let A = [a4 a5 a6] then Ax = [ <row2,x> ] = [a4*x1 + a5*x2 + a6*x3]
+		[a7 a8 a9]           [ <row3,x> ]   [a7*x1 + a8*x2 + a9*x3]
+The right-hand part may be written as
+	   [a1]      [a2]      [a3]
+	x1*[a4] + x2*[a5] + x3*[a6]
+	   [a7]      [a8]      [a9]
+or, in other words,
+	Ax = x1*column1 + x2*column2 + x3*column3
+that is, the components of x times the columns of A, added together.
+This is a very useful thing to be aware of, as we shall see.
+	Note that normal multiplication commutes: 3*2 = 2*3.  In matrix
+multiplication, this is NOT true.  Multiplication in general does
+NOT commute, and AB is usually different from BA.
+	We can also divide by matrices, but it isn't called division.
+It's called inversion.  Let's say you have an equation like
+*x = b.
+To solve for x you would just multiply both sides by 1/5 i.e. by
+the "inverse" of 5.  To solve a matrix equation like
+	Ax = b
+we just multiply both sides by the inverse of A -- call it A'.
+And in just the same way that 1/a * a is one, a matrix times its
+inverse is the _identity matrix_ which is the matrix with "1" down
+the diagonal:
+	    [1 0 0]
+	I = [0 1 0]
+	    [0 0 1]
+It is called the identity because IA = A; multiplying by the identity
+matrix is just like multiplying by one.
+	Inverting a matrix is in general a very expensive operation,
+and we don't need to go into it here.  We will be doing some special
+inversions later on though, so keep in mind that an inversion
+un-does a matrix multiplication.
+Transformations -- more than meets the eye!
+---------------
+	Now we have an _extremely_ powerful tool at our disposal.
+What happens when you multiply a matrix times a vector?  You get
+a new vector, of the same dimension as the old one.  That is, it
+takes the old vector and _transforms_ it to a new one.  Take the
+two-dimensional case:
+	[a b] [x] = [a*x + b*y]
+	[c d] [y]   [c*x + d*y]
+Look familiar?  Well if a=cos(s), b=-sin(s), c=cos(s), and d=sin(s)
+we get the earlier rotation equations, which we can now rewrite as
+	P' = RP
+where R is the rotation matrix
+	R = [cos(s) -sin(s)]
+	    [sin(s)  cos(s)].
+The way to think about this is that R _operates_ on a vector P.
+When we apply R to P, it rotates P to a new vector P'.
+	So far we haven't gained much.  But in three dimensions,
+the rotation matrices look like
+	     [cos(s) -sin(s) 0]        [cos(s)  0  sin(s)]
+	Rz = [sin(s)  cos(s) 0]   Ry = [  0     1    0   ]  etc.
+	     [  0       0    1]        [-sin(s) 0  cos(s)]
+Try multiplying Rz times a vector (x,y,z) to see that it rotates
+x and y while leaving z unchanged -- it rotates about the z-axis.
+	Now let's say that we're navigating through some 3D world
+and have turned, and looked up, and turned a few more times, etc.
+That is, we apply a bunch of rotations, all out of order:
+	Rx Ry Ry Rz Rx Rz Ry Ry Rx P = P'
+All of the rotation matrices on the left hand side can be multiplied
+together into a _single_ matrix R
+	R = Rx Ry Ry Rz Rx Rz Ry Ry Rx
+which is a _new_ transformation, which is the transformation you
+would get by applying all the little rotations in that specific order.
+This new matrix can now be applied to any number of vectors.  And this
+is extremely useful and important.
+	Another incredibly useful thing to realize is that the
+inverse of a rotation matrix is just its transpose (reflect through
+the diagonal, i.e. element i,j swaps with element j,i).  It's
+very easy to see for the individual rotation matrices -- the inverse
+of rotating by an amount s is to rotate by an amount -s, which flips
+the minus sign on the sin(s) terms above.  And if you take the transpose
+of the accumulated matrix R above, remembering that (AB)^T = B^T A^T,
+you'll see that R^T just applies all of the inverse rotations in
+the opposite order -- it undoes each small rotation one at a time
+(multiply R^T times R to see that you end up with the identity matrix).
+	The important point, though, is that to invert any series of
+rotations, no matter how complicated, all we have to do is take a
+transpose.
+Hidden Faces (orientation)
+------------
+	Finally there is the issue of hidden faces, and the related
+issue of polygon clipping.  In previous articles a number of different
+methods have been tried.  For hidden faces, the issue is to determine
+whether a polygon faces towards us (in which case it is visible) or
+away from us (in which case it isn't).
+	One way is to compute a normal vector (for example, to rotate
+a normal vector along with the rest of the face).  A light ray which
+bounces off of any point on this face will be visible -- will go
+through the origin -- only if the face faces towards us.  The normal
+vector tells us which direction face is pointing in.  If we draw
+a line from our eye (the origin) to a point on the face, the normal
+vector will make an angle of 90 degrees with the face when the face
+is edge-on.  So by computing the angle between the normal vector and
+a vector from the origin to a point on the face we can test for
+visibility by checking if it is greater than or less than 90 degrees.
+	We have a way of computing the angle between two vectors:
+the dot-product.  Any point on the face will give a vector from
+the origin to the face, so choose some vertex V on the polygon,
+then dot it with the normal vector N:
+	<V,N> = Vx*Nx + Vy*Ny + Vz*Nz = |V| |N| cos(theta)
+Since cos(theta) is positive for theta<90 and negative for theta>90
+all that needs to be done is to compute Vx*Nx + ... and check whether
+it is positive or negative.  If you know additional information about
+either V or N (as earlier articles did) then this calculation can
+be simplified by quite a lot (as earlier articles did!).  And if
+you want to be really cheap, you can just use the z-coordinate of the
+normal vector and skip the dot-product altogether (this only works
+for objects which are reasonably far away, though).
+	Another method involves the cross-product -- take the
+cross-product of two vectors in the _projected_ polygon and see
+whether it points into or out of the monitor.  Again, only the
+direction is of interest, so that usually means that only the
+z-component of the vector needs to be computed.  On the downside,
+I seem to recall that in practice this method was cumbersome and
+tended to fail for certain polygons, making them never visible (because
+of doing integer calculations and losing remainders).
+	The final method is a simple ordering argument: list the
+points of the polygon in a particular order, say clockwise.  If,
+after rotating, that same point list goes around the polygon in a
+counter-clockwise order then the polygon is turned around the other
+way.  This is the method that the 3d library uses.  It is more or
+less a freebie -- it falls out automatically from setting up the
+polygon draw, so it takes no extra computation time for visible
+polygons.  It is best-suited to solid polygon routines, though.
+	Polygon clipping refers to determining when one polygon
+overlaps another.  For convex objects (all internal angles are
+less than 180 degrees) this isn't an issue, but for concave
+objects it's a big deal.  There are two convex objects in Cool World:
+the pointy stars and the Cobra Mk III.  The pointy stars are clipped
+correctly, so that tines on the stars are drawn behind each other
+properly.  The Cobra doesn't have any clipping, so you can see
+glitches when it draws one polyon on top of another when it should
+be behind it.
+	A general clipping algorithm is pretty tough and time-consuming.
+It is almost always best to split a concave object up into a group
+of smaller convex objects -- a little work on your part can save
+huge amounts of time on the computer's part.
+	And THAT, I think, covers the fundamentals of 3d graphics.
+Now it's on to constructing a 3D world, and implementing all that
+we've deduced and computed as an actual computer program.
+</code>
+===== Section 2: Constructing a 3D world =====
+<code>
+	Now that we have the tools for making 3D objects, we need to
+put them all together to make a 3D world.  This world might have
+many different objects in it, all independent and movable.  They
+might see each other, run into each other, and so on.  And they
+of course will have to be displayed on the computer.
+	As we shall see, we can do all this in a very elegant and
+simple manner.
+Representing objects
+--------------------
+	The object problem boils down to a few object attributes:
+each object has a _position_ in the world, and each object has an
+_orientation_.  Each object might also have a _velocity_, but velocity
+is very easy to understand and deal with once the position and
+orientation problem has been solved.
+	The position tells where an object is -- at least, where the
+center of the object is.  The orientation vector tells in what direction
+the object is pointing.  Velocity tells what direction the object is
+moving.  Different programs will handle velocity in different ways.
+Although we are supposed to be tolerant of different positions and
+orientations, in these programs they are all handled the same way
+and are what will be focused on.
+	If you can't visualize this, just think of some kind of
+spaceship or fighter plane.  It is located somewhere, and it points
+in a certain direction.  The orientation vector is a little
+line with an arrow at the end -- it points in a certain direction
+and is anchored at some position.  As the object turns, so does
+the orientation vector.  And as it moves, so does the position.
+	Once positions and orientations are known we can calculate
+where everything is relative to everything else.  "Relative" is an
+important word here.  If we want to know how far we are from some
+object, our actual locations don't matter; just the relative locations.
+And if we are walking around and suddenly turn, then everything better
+turn around us and not some other point.  Rotations are always about
+the origin, which means we are the origin -- the center of the
+universe.  Recall also that the way projections work is by assuming
+we are looking straight down the z-axis, so our orientation better
+be straight.
+	So to find out what the world looks like from some object
+we need to do two things.  First, all other objects need to be
+translated to put the specific object at the origin -- shift the
+coordinate system to put the object at (0,0,0).  Second, the orientation
+vector of the object needs to be on the z-axis for projections to
+work right -- rotate the coordinate system and all of the objects
+around us to make the orientation vector be in the right direction.
+	Every time an object moves, its position changes.  And every
+time it turns or tilts it changes its orientation.  Let's say that
+after a while we are located at position P and have an orientation
+vector V, and want to look at an object located at position Q.
+Translating to to the origin is easy enough: just subtract P from all
+coordinates.  So the original object will be located at P-P = 0,
+and the other object will be located at Q-P.  What about rotating?
+	The orientation vector comes about by a series of rotations
+applied to the initial orientation.  As was explained earlier, all of
+those rotations can be summed up as a single rotation matrix R,
+so if the intial vector was, say, z=(0,0,1) then
+	V = Rz
+Given a position vector, what we need to do is un-rotate that position
+vector back onto the z-axis.  Which means we just multiply by
+the _inverse_ of R, R'.  For example, if we turn to the left, then
+the world turns to the right.  If we turn left and then look up,
+the way to un-do that is to look back down and turn right.  That's
+an inverse rotation, and we know from the earlier sections that
+rotation matrices are very easy to invert.
+	So, after translating and rotating, we are located at the
+origin and looking straight down the z-axis, whereas the other
+object Q is located at the rotated translated point
+	Q2 = R' (Q-P)
+i.e. Q-P translates the object, and R' rotates it relative to us.
+	Readers who are on the ball may have noticed that so far we
+have only deduced what happens to the _center_ of the object.  Quite
+right.  We need to figure out what happens to the points of an
+object.
+	First it should be clear that we want to define the points
+of an object relative to the center of the object.  It is first
+necessary so that the object can rotate on its own.  And it has
+all kinds of computational advantages, as we shall see in a later
+section; among them are that the numbers stay small (and the
+rotations fast), they are always rotated as a whole (so they don't
+lose accuracy or get out of proportion), and the points of objects
+which aren't visible don't have to be rotated at all.
+	So we need to amend the previous statement: we need to
+figure out what happens to the points of an object _only when that
+object is visible_.
+Visibility
+----------
+	When is an object visible?  When it is within your field
+of vision of course.  It has to be in front of you, and not too
+far out to the sides.
+	So, after translating and rotating an object center,
+	Q2 = R' (Q-P),
+we need to check if Q2 is in front of us (if the z-coordinate is
+positive, and that it isn't too far away), and that it isn't too
+far out to either side.  The field of vision might be a cone; the
+easiest one is a little pyramid made up of planes at 45 degrees
+to one another.  That is, that you can see 45 degrees to either
+side or up and down.  This is the easiest because the equation of
+a line (a plane actually) at 45 degrees is either
+	x=z or x=-z,  y=z or y=-z
+So it is very easy to check the x- and y-coordinates of the new center
+to see if -z < x < z and -z < y < z.
+	If the object is visible, then the rest of the object needs
+to be computed and displayed.
+Computing displayed objects
+---------------------------
+	Now let's say this object, which was located at Q, is
+visible.  It has a bunch of points that define the object, relative
+to the center -- call those points X1 X2 X3 etc. where X1=(x1,y1,z1)
+etc.  -- and it has an orientation W.  Again, the orientation
+vector is computed by performing some series of rotations M on
+a vector which lies along the z-axis like (0,0,1).
+	First the points need to be rotated along with the
+orientation vector.  This isn't an inverse rotation like before --
+the points rotate in the same way as the orientation vector.
+So apply M to all the points; consider a single point X1:
+	rotated point = M X1
+Once the points have been rotated we have to find their actual
+location.  Since they are defined relative to the center of the
+object, this means just adding them to the center of the object:
+	rotated point + Q = M X1 + Q
+This is the _actual_ location of the points in the world.  Now as
+before we need to translate and rotate them to get their relative
+location:
+	X1' = R' (M X1 + Q - P)
+As before, subtract P and apply the inverse rotation R'.  The above
+equation is the _entire_ equation of the 3D world!
+	We can rewrite this equation as
+	X1' = R' M X1 + R'(Q-P)
+and recognize that R'(Q-P) was calculated earlier, to see if the
+object was visible.
+	The above equation can be read as "Rotate the rotated object
+points backwards to the way we are facing, and add to the rotated
+and translated center."  That is, if we turn one way, the center
+rotates in the opposite direction, and the entire object rotates
+in the opposite direction.  Physically, if you turn your head
+sideways you still see the monitor facing you.  If instead of
+turning your head you were to move the monitor, you would also
+have to rotate the monitor to keep it facing you (if you didn't
+rotate the monitor, and only changed its position, you would
+be able to see the sides, etc.).  That rotation is in the opposite
+direction that your head would rotate (head turns to the right,
+monitor turns to the left).
+Displaying objects
+------------------
+	Now that everyone is rotated and translated and otherwise
+happy, they need to be projected and displayed.  The projection is
+easy -- as before, divide by the z-coordinate and multiply by
+a magnification factor.  Then connect the points in the right
+way to get the polygons and faces that make up the objects, using
+the appropriate method to remove faces that aren't visible, and
+draw to the screen.
+	One thing remains, though: depth-sorting the objects, to
+make sure that close-up objects overlap far-away objects.  So,
+_before projecting_, all of the z-coordinates need to be looked
+at and the objects ordered so that they are drawn in the right
+way.
+	This is really a form of polygon clipping.  You can do
+the same thing with various complex objects to get an easy way
+to clip.
+Summary
+-------
+	All objects have a position and an orientation, and might
+have things like velocity.  The position of the object is where
+the object is located.  The points that make up any object are defined
+relative to this center; the center is thus the center of rotation
+of the object in addition to being its location.  The orientation
+determines what direction the object is pointing in, and the velocity
+determines what direction it is moving in.
+	Navigating through the world amounts to changing the position
+and orientation (and velocity) of the object.  To figure out how the
+world looks from any single object with position P, P=(Px,Py,Pz), and
+orientation matrix R (where R=cumulative rotation matrix), a very simple
+and elegant equation is used.  First the equation
+	R'(Q-P)
+where R' = inverse of R, determines where the center of an object Q
+is located (by translating and then undoing the orientation of P), and
+tells whether the object at Q is visible or not.  If it is, then the
+equation
+	R' (M X + Q-P)
+gives the new location of any point X on the object, where M is
+the orientation matrix for Q, and X is some point defined relative
+to Q.  After depth-sorting the individual objects, these points
+are projected and drawn to the screen.
+	Doing things in this way gives a method that is very
+fast and efficient, and always retains accuracy -- everything
+is calculated relative to everything else.  This is extremely
+powerful.  It makes the programming of objects easy, it makes no
+roundoff errors from e.g. rotating rotated points, and no cycles
+are wasted calculating rotations of non-visible objects.  Some PC
+algorithms you might come across do all sorts of God-awful things to
+overcome these problems, like rotate a single point and then use
+cross-products to preserve all the angles, and all sorts of other
+horrid mathematical butchery.  That way leads to the Dark Side.
+	By the way, there are some subtleties (for example, computing
+orientation vectors) in this calculation -- see the discussion of
+Cool World in Section 4 for more detail on implementation issues.
+</code>
+===== Section 3: Implementing the 3D Library =====
+<code>
+	Now that we've dined on a tasty and nutritous meal of some
+theory we need to figure out how to implement everything on the 64,
+and to do so efficiently!  The idea of the 3D library is to collect
+all of the important and difficult routines into a single place.
+Before doing so it is probably a good idea to figure out just what
+the important routines are.
+	Obviously rotations and projections are important.  In fact,
+there just isn't much else to do -- a few additions here and there,
+maybe a few manipulations, but everything else is really straightforward.
+Of course, once that data is rotated and projected it would probably
+be much more enjoyable for the viewer to display it on the screen.
+Drawing lines is pretty easy, but a good solid-polygon routine is
+pretty tough so it makes a good addition to the library.
+	So three things, and the first is rotations.  There are two
+kinds of rotations.  When we turn left or right we rotate the world,
+which means rotating the positions of objects.  We also need to
+rotate the object itself, which means rotating the individual
+points that define the object.
+Calculating rotation matrices
+-----------------------------
+	In both cases, we need a rotation matrix that we can use to
+transform coordinates.  Previous programs just calculated a rotation
+matrix like
+	R = Ry Rz Rx    so that for some point P:  RP = Ry Rz Rx P
+that is, given three angles sx, sy, and sz, calculate the rotation
+matrix you get by first rotating around the x-axis by an amount sx
+(Rx times P), then the z-axis by an amount sz (Rz times Rx P),
+and finally the y-axis by the amount sy (Ry times (Rz times Rx P)).
+	Well, that kind of routine just doesn't cut it anymore.  Why?
+Well, what happens when you turn left, turn left, look up, and roll?
+You get a matrix that looks like
+	R = Rz Rx Ry Ry     (think Ry=turn left, Rx=look up, Rz=roll)
+What three angles sx sy and sz would give us that exact rotation
+matrix?  -We don't know-!  As has been said many times, **matrix
+multiplications do not commute**, so rotations do not commute, so
+we can't just add up the rotation amounts or something.  We have
+to keep a running rotation matrix, that just accumulates rotations
+each time we turn left or roll or whatever.  In terms of a routine,
+that means we need to be able to compute
+	Rx R  and  Ry R  and  Rz R
+where R is the accumulated rotation matrix, and Rx etc. are the
+little rotation matrices about the x/y/z axis that correspond to us
+turning left and right such.  Although this seems like a lot of
+work, as we shall see it is actually quite a bit faster than
+calculating an entire rotation matrix like the old routine.
+	Still, it is sometimes handy to be able to calculate a rotation
+matrix using the old method, like when rotating by large amounts, or if
+we just need an object to spin around in some arbitrary way, or if all
+we need is a direction and an elevation (think of a gun turret or a
+telescope).  So an improved version of the old routine is also
+included.
+Rotating points
+---------------
+	That ought to take care of calculating the rotation matrix.
+Now we need to apply that matrix to some points.  The first type of
+points are the object centers (i.e. positions in the world).
+Polygonamy used a single byte to represent positions, and that was
+really too restrictive.  A full 16-bit number is needed, and it
+should be signed.  So we need to be able to rotate 16-bit signed numbers.
+	The second type of points are the object points, which are
+defined relative to the object centers.  Since they are relative to
+the centers these points can be 8-bits, but they should definitely
+be signed integers.  There are many more object points than there
+are objects, so these rotations need to be really fast.
+	In both cases, it clearly makes much more sense to pass in
+a whole list of points to be rotated, instead of having to call
+the routine for each individual point.
+	Remember that the object points are only rotated if the
+object is actually visible.  And, in point of fact, once they've
+been rotated they will need to be projected.  So it makes sense
+to tack the projection routine onto the object-rotation routine
+(as opposed to the world-rotation routine, which rotates the object
+centers).  To summarize, then, in addition to the matrix calculation
+routines we need two rotation routines: one for rotating 16-bit signed
+object centers, and one for rotating (and possibly projecting) the
+actual object points.
+Drawing Polygons
+----------------
+	Once everything has been rotated and projected, it will need
+to be drawn.  So a good solid polygon routine makes a great addition
+to the library.  The algorithm will be built upon the polygonamy
+idea: start at the bottom of the polygon, and draw the left and
+right sides of the polygon _simultaneously_, filling in-between.
+That is, move up in the y-direction, calculate the left-endpoint,
+calculate the right-endpoint, and fill.
+	To calculate the endpoints we just calculate the slopes of
+the lines, and draw.  So all that is needed is to pass in a list
+of points, and perhaps tell the routine where to draw those points.
+Recall also that this gives a very easy way of doing hidden faces:
+if the points are given in counter-clockwise order for the normal
+polygon, then they will be in clockwise order if the polygon is
+facing away from us.  So basically the polygon isn't visible if
+we can't draw it, and the hidden-face calculation is more or less
+a freebie.  (That is, the hidden-face calculation takes no extra
+time if the face is visible; it will use up some time though if
+the face is not visible.)
+	The polygonamy routine had two large disadvantages: it was
+huge, because of the way the fill-routines and tables were set up, and
+it was inflexible in the sense that it could only draw to a specific
+bitmap.  It also couldn't handle things like negative coordinates, or
+drawing polygons which were partially off-screen.  The code was also
+a bit convoluted, and in need of re-thinking (imagine a big prototype
+machine with exposed wires sticking out and large cables snaking around,
+all held together with duct tape and bailing wire, and you'll get an
+idea of what the polygonamy code looks like).  Of course, since we
+already know the ideas and equations, it's not too tough to just
+rework everything from scratch.
+Basic Routines
+--------------
+	Clearly, before writing all of the library routines we are
+going to have to figure out how to do some very general operations.
+We are going to need some 8- and 16-bit signed multiply routines.
+The projection routine is going to involve a divide of a 16-bit
+z-coordinate, and the polygon routine is going to need a division
+routine for calculating line slopes.  Then we have to figure out
+how to represent numbers and on and on.  Hokey smokes!  This is
+all new stuff, too.
+	But, first and foremost, we need a multiply.  We will of course
+be using the fast multiply, which uses the fact that
+	a*b = f(a+b) - f(a-b),   f(x) = x^2/4.
+Understanding this routine will be crucial to understanding what will
+follow.  The general routine to multiply .Y and .A looks like
+	STA ZP1		;ZP1 points to f(x) low byte
+	STA ZP2		;high byte
+	EOR #$FF
+	ADC #01
+	STA ZP3		;table of f(x-256) -- see below
+	STA ZP4
+	LDA (ZP1),Y	;f(Y+A), low byte
+	SEC
+	SBC (ZP3),Y	;f(Y-A), low byte
+	TAX
+	LDA (ZP2),Y	;f(Y+A), high byte
+	SBC (ZP4),Y	;f(Y-A), high byte
+			;result is now in .X .A = low, high
+By using the indexed addressing mode, we let the CPU add A and Y
+together.  If Y and A can both range from 0..255, then Y+A can
+range from 0..510.  So we need a 510-byte table of f(x).  What
+about Y-A?  The complementing of A means that what we are really
+calculating on the computer is Y + 256-A, or 256+Y-A.  So we need
+another table of f(x-256) to correctly get f(Y-A).
+	Consider Y=1, A=2.  -A = $FE in 2's complement, so added together
+we get $FF, or -1.  Next consider Y=2, A=1.  In this case, Y-A gives
++$FF = $101, or 257.  Computationally, if Y-A is 0..255 it is a negative
+number, and subtracting 256 gives the correct negative number.  If
+Y-A is 256..510, then it really represents 0..255, so again subtracting
+gives the correct number.  If you think about it, you'll see
+that the upper 256 bytes of this table is the same as the lower
+bytes of the first table: f(x-256) for x=256..511 is the same
+as f(x) for x=0..255.  This means that the first table can be
+piggy-backed on top of the second table in memory.  For example,
+if the table were at $C000
+	$C000	f(x-256)
+	$C100	f(x)  (512 bytes)
+would do the trick.  WITH ONE EXCEPTION.  And a rather silly one
+at that.  That exception is A=0.  Multiplying by zero should give
+a zero.  But the complementing of A gives -256, which on the computer
+is... zero!  That is, think of A=0 and Y=0.  A-Y=0, and when we
+look it up in the table of f(x-256) we will get f(-256) instead
+of f(0), and get the wrong answer.  So we have to check for A=0.
+	And oh yes: don't forget to round the tables correctly!
+Extremely Cool Stuff
+--------------------
+	As usual, a little bit of thinking and a little bit of
+planning ahead will lead to enormous benefits.
+	After staring at the fast multiplication code for a bit,
+we can make a very useful observation:
+	Once the zero page locations are set up, they stay set up.
+This makes all the difference in the world!  Let's say we've just
+calculated a*b.  If we now want to calculate c*b, _we don't have to
+set up b again_.
+	If we have a succession of calculations, x*a1, x*a2, x*a3,
+etc., we only have to set up the ZP location once (with x and -x)
+and keep reusing it.  This makes the fast multiply become very fast!
+With a lot of multiplications it reduces to just 24 cycles or so for
+a full 16-bit result, as all of that zero-page overhead disappears.
+	Now think about rotations.  We have to multiply a matrix
+times a vector (i.e. the three coordinates).  It was pointed out
+in the discussion of linear algebra that we can write the
+multiplication as
+	  [x1]
+	R [x2] = x1*column1 of R + x2*column2 of R + x3*column3 of R
+	  [x3]
+And there is the payoff: we get three multiplies for every one
+set up of zero page.  That is, we set x1 in the zero page variables,
+and then multiply by the three elements of the first column.
+Then do the same for x2 and x3.  So we still do nine multiplications,
+but we only have to set up the zero-page locations three times.
+Just changing the way in which we do these multiplications has
+instantly given us a great big time savings!
+	Now we need to figure out how to do signed multiplies.  The
+fast multiply runs into problems adjusting to signed numbers.  Consider
+the simple example of x=-100 and y=1.  We get that
+	f(x+y) = f(256-100 + 1) = f(157).
+But this is just what we'd get if we multiplied x=100 and y=57
+together.  You don't know whether 157 is -99 or if it really
+is 157.  So we can't just modify the tables... unless... well, what
+about if we are able to remove this ambiguity from the tables?
+We can do just that if we restrict the range of allowed numbers.
+For example, if we restrict to
+	x=-96..95 = 160..255,0..95  (-96 is 160 in 2's complement)
+	y=-64..64 = 192..255,0..64  (-64 is 192, -63 is 193, etc.)
+then
+	if x+y = 0..159   the actual number is  0..159
+..255			-96..-1
+..351			0..95
+..510			-160..-2
+That is, the _only_ way you can get the number 159 from x+y is when
+x=95 and y=64.  The four ranges above are derived by considering
+the four cases x>0 and y>0, x<0 and y>0, x<0 and y<0, x>0 and y<0.
+All values of x+y and x-y, from 0 to 510, correspond to _unique_ values
+of x and y, if we restrict x to the range -96..95 and restrict y to the
+range -64..64.  With those restrictions, we can construct a *single*
+-byte fast-multiply table to do a signed fast-multiply.
+	How is this useful?
+	One word: object rotations.  If the rotation matrix values
+range from -64..64, and the object points range from -96..95, then
+we can do an _extremely_ fast multiply to perform the object rotations,
+as we shall see in the discussion of the rotation routines.  Since
+there are quite a lot of object points to rotate, this makes an
+enormous contribution to the overall speed of the routine.
+	Although the above helps out greatly for the point rotations,
+we still need a general signed multiply routine for the centers and
+the projections and such.  So, let's figure it out!
+	Say we multiply two numbers x and y together, and x is negative.
+If we plug it in to a multiplication routine (_any_ multiplication
+routine), we will really be calculating
+	(2^N + x)*y = 2^N*y + x*y
+assuming that x is represented using 2's complement (N would be 8 or 16
+or whatever).  There are two observations:
+	- If the result is _less_ than 2^N, we are done -- 2^N*y is all
+	  in the higher bytes which we don't care about.
+	- Otherwise, subtract 2^N*y from the result, i.e. subtract
+	  y from the high bytes.
+Now let's say that both x and y are negative.  Then on the computer
+the number we will get is
+	(2^N + x)*(2^N + y) = 2^(2N) + 2^N*x + 2^N*y - x*y
+Now it is too large by a factor of 2^2N, 2^N*x and 2^N*y.  BUT
+the basic observations haven't changed a bit!  We still need to
+_subtract_ x and y from the high bytes.  And the 2^2N is totally
+irrelevant -- we can't get numbers that large by multiplying numbers
+together which are no larger than 2^N.
+	This leads to the following algorithm for doing signed
+multiplications:
+	multiply x and y as normal with some routine
+	if x<0 then subtract y from the high bytes of the result
+	if y<0 then subtract x from the high bytes
+And that's all there is to it!  Note that x and y are "backwards",
+i.e. subtract y, and not x, when x<0.  Some examples:
+	x=-1, y=16  Computer: x=$FF y=$10  (N=8)
+		    x*y = $0FF0
+		    Result is less than 256, so ignore high byte
+		    	Answer = $F0 = -16
+		    OR: subtract y from high byte,
+			Answer = $FFF0 = -16
+	x=2112 y=-365 Computer: x=$0840 y=$FE93   (N=16)
+		    x*y = $08343CC0
+		    y<0 so subtract x from high bytes (x*2^16),
+			Answer = $F43CC0 = -770880
+	x=-31 y=-41 Computer: x=$E1 y=$D7
+		    x*y = $BCF7
+		    x<0 so subtract $D700 -> $E5F7
+		    y<0 so subtract $E100 -> $04F7 = 1271 = correct!
+So, in summary, signed multiplies can be done with the same fast
+multiply routine along with some _very simple_ post-processing.
+And if we know something about the result ahead of time (like if
+it's less than 256 or whatever) then it takes _no_ additional
+processing!
+	How cool is that?
+Projections
+-----------
+	After rotating, and adding object points to the rotated centers,
+the points need to be projected.  Polygonamy had a really crappy
+projection algorithm, which just didn't give very good results.
+I don't remember how it worked, but I do remember that it sacrificed
+accuracy for speed, and as a result polygons on the screen looked
+like they were made of rubber as their points shifted around.
+So we need to re-solve this problem.
+	Recall that projection amounts to dividing by the z-coordinate
+and multiplying by a magnification factor.  Since the object points will
+all be 16-bits (after adding to the 16-bit centers), this means dividing
+a 16-bit signed number by another 16-bit number, and then doing a
+multiplication.  Bleah.
+	The first thing to do is to get rid of the divide.  The
+projected coordinates are given by
+	x' = d*x/z = x * d/z,   y' = d*y/z = y * d/z
+where d is the magnification factor and z is the z-coordinate of
+the object.  The obvious way to avoid the divide is to convert it
+into a multiply by somehow calculating d/z.  The problem is that
+z is a 16-bit number; if it were 8 bits we could just do a table
+lookup.  If only it were 8 bits...  Hmmmmm...
+	Physically the z-coordinate represents how far away the
+object is.  For projections, we need
+	- accuracy for small values of z (when objects are close to
+	  us and large)
+	- speed for large values of z
+If z is large then accuracy isn't such a big deal, since the objects
+are far away and indistinct.  So if z is larger than 8 bits, we can
+just shift it right until it is eight bits and ignore the tiny loss
+of accuracy.  Moreover, note that the equations compute x/z and y/z;
+that is, a 16-bit number on top of another.  Therefore if we shift
+_both_ x and z right we don't change the value at all!	So now we
+have a nifty algorithm:
+	if z>255 then shift z, x, and y right until z is eight bits
+	compute d/z
+	multiply by x, multiply by y
+There are still two more steps in this algorithm, the first of which
+is to compute d/z.  This number can be less than one and as large
+as d, and will almost always have a remainder.  So we are going to
+need not only the integer part but the fractional part as well.
+The second step is to multiply this number times x.
+	Let d/z = N + R/256, so N=integer part and R=fractional part.
+For example, 3/2 = 1 + 128/256 = 1.5.  Also let x = 256*xh + xl.
+Then the multiplication is
+	x * d/z = (256*xh + xl) * (N + R/256)
+	        = 256*xh*N + xh*R + xl*N + xl*R/256
+There are four terms in this expression.  We know ahead of time that
+the result is going to be less than 16-bits (remember that the screen
+is only 320x200; if we are projecting stuff and getting coordinates
+like 40,000 then there is a serious problem!  And nothing in the
+library is set up to handle 24-bit coordinates anyways).
+	The first term, xh*N, must therefore be an eight-bit number,
+since if it were 16-bits multiplying by 256 would give a 24-bit number,
+which we've just said won't happen.  So we only really care about
+the lower eight bits of xh*N.  The next two terms, xh*R and xl*N,
+will be 16-bit numbers.  The last term, xl*R. will also be 16-bits,
+but since we divide by 256 we only care about the high 8-bits of
+the result (we don't need any fractional parts for anything).
+	And that, friends, takes care of projection: accurate when
+it needs to be, and still very fast.
+Cumulative Rotations
+--------------------
+	There is still this problem with the accumulating rotation
+matrix.  The problem to be solved is: we want to accumulate some
+general rotation operator
+	    [A B C]
+	M = [D E F]
+	    [G H I]
+by applying a rotation matrix like
+	     [ 1    0      0   ]
+	Rx = [ 0  cos(t) sin(t)]
+	     [ 0 -sin(t) cos(t)]
+to it.  So, just do it:
+               [         A                  B                  C         ]
+	Rx M = [ D*cos(t)+G*sin(t)  E*cos(t)+H*sin(t)  F*cos(t)+I*sin(t) ]
+               [-D*sin(t)+G*cos(t) -E*sin(t)+H*cos(t) -F*sin(t)+I*cos(t) ]
+You might want to do the matrix multiplication yourself, to double-check.
+Notice that it only affects rows 2 and 3.  Also notice that only one
+column is affected at a time: that is, the first column of Rx M contains
+only D and G (the first column of M); the second column only E and H; etc.
+Similar expressions result when multiplying by Ry or Rz -- they just
+affect different rows.  The point is that we don't need a whole bunch
+of routines -- we just need one routine, and to tell it which rows to
+operate on.
+	In general, the full multiplication is a pretty hairy problem.  But
+if the angle t is some fixed amount then it is quite simple.  For example,
+if t is some small angle like 3 degrees then we can turn left and right
+in 3 degree increments, and the quantities cos(t) and sin(t) are just
+constants.  Notice also that rotating in the opposite direction
+corresponds to letting t go to -t (i.e. rotating by -angle).  Since
+sine is an odd function, this just flips the sign of the sines
+(i.e. sin(-t) = -sin(t)).
+	If sin(t) and cos(t) are constants, then all that is needed
+is a table of g(x) = x*sin(t) and x*cos(t); the above calculation then
+reduces to just a few table lookups and additions/subtractions.
+There's just one caveat: if t is a small number (like three degrees)
+then cos(t) is very close to 1 and sin(t) is very close to 0.
+That is to say, the fractional parts of x*sin(t) and x*cos(t) are very
+important!
+	So we will need two tables, one containing the integer part
+of x*cos(t) and the other containing the fractional.  This in turn
+means that we need to keep track of the fractions in the accumulation
+matrix.  The accumulation matrix will therefore have eighteen elements;
+nine integer parts and nine fractional parts.  (The fractional part
+is just a decimal number times 256).
+	The routines to rotate and project only use the integer parts;
+the only routines that really need the fractional parts are the
+accumulation routines.
+	How important is the fractional part?  Very important.  There
+is a class of transformations called area-preserving transformations,
+of which rotations are a member.  That is, if you rotate an object,
+the area of that object does not change.  The condition for a matrix
+to be area-preserving is that its determinant is equal to one; if
+the cumulative rotation matrix starts to get inaccurate then its
+determinant will gradually drift away from one, and it will no
+longer preserve areas.  This in turn means that it will start to
+distort an object when it is applied to the object.  So, accuracy
+is very important where rotation matrices are concerned!
+	There is one other issue that needs to be taken care of.
+Rotation matrices have this little feature that the largest
+value that any element of the matrix can ever take is: one!
+And usually the elements are all less than one.  To retain
+accuracy we need to multiply the entire matrix by a constant;
+a natural value is 64, so that instead of ranging from -1..1
+the rotation matrix elements will range from -64..64.  And as was
+pointed out earlier, we can do some very fast signed arithmetic
+if one of the numbers is between -64 and 64.
+	The downside is that we have to remember to divide out
+that factor of 64 once the calculation is done -- it's just a
+temporary place-holder.  For object point rotations we can
+incorporate this into the multiplication table, but for the 16-bit
+center rotations we will have to divide it out manually.
+Polygon Routine Calculations
+----------------------------
+	Finally, there's the polygon routine.  Starting at the lowest
+point on the polygon, we need to draw the left and right sides
+simultaneously.  Specifically we want to draw the lines to the
+left and right, and we only care about the x-coordinates of the
+endpoints at each value of y:
+	start at the bottom
+	Decrement y (move upwards one step)
+	Compute x-coord of left line
+	Compute y-coord of right line
+	(Fill in-between)
+Drawing a line is easy.  The equation of a line is
+	dy = m*dx,    m=slope, dy=change in y, dx=change in x
+The question posed by the above algorithm is "If I take a single
+step in y, how far do I step in x?"  Mathematically: if dy=1,
+what is dx?  Obviously,
+	dx = 1/m
+i.e. the inverse slope.  If the endpoints of the line are at
+(X1,Y1) and (X2,Y2), then the inverse slope is just
+/m = DX/DY,  where DX=X2-X1 and DY=Y2-Y1.
+Once this is known for the left and right sides, updating the x-coordinates
+is a simple matter of just calculating x+dx (dx=DX/DY).  Note that
+if DY=0, the line segment is a straight horizontal line.  Therefore
+we can just skip to the next point in the polygon, and let the
+fill routine -- which is very good at drawing straight lines -- take
+care of it.
+	Now we can start drawing the left and right sides.  As we
+all know, lines on a computer are really a series of horizontal
+line segments:
+                   *****      **
+              *****             *           (lines with positive and
+          ****                   **          negative slope)
+     *****                         *
+Note that the polygon routine doesn't calculate all points on the
+line, the way a normal line routine does.  It only calculates the
+left or right endpoints of each of those horizontal line segments.
+For example, if dx=5 then the routine takes big steps of 5 units each;
+the fill routine takes care of the rest of the line segment.
+	There are two issues that need to be dealt with in the polygon
+routine.  The first trick in drawing lines is to split one of those
+line segments in half, for the first and last points.  That is, consider
+a line with DY=1 and DX=10.  From the above equations, dx = 10, and if
+this line was just drawn naively it would look like
+                 *
+	**********
+i.e. a single horizontal line segment of length 10, followed by a dot
+at the last point.  What we really want is to divide that one line
+segment in half, to get
+	     *****
+	*****
+for a nice looking line.  In the polygon routine, this means that the
+first step should be of size dx/2.  All subsequent steps will be
+of size dx, except for the last point.
+	The other issue to be dealt with again deals with the endpoints.
+How to deal with them depends on two things: the slope of the line,
+and whether it is the left or right side of the polygon.  Consider
+the below line, with endpoints marked with F and L (for first and last)
+	      *L**
+	  ****
+	F*
+and dx=4.  You can see that we have overshot the last point.  In point
+of fact, we may also have overshot the first point.  Let's say the above
+was the left side of a polygon.  Then we would want to start filling
+at the LEFT edge of each line segment -- we want that * to the left
+of L to be drawn, but want to start filling at F.  On the other hand,
+if this were the right side of the polygon then we would want to fill
+to the right edge of each line segment.  That means filling to point L,
+again grabbing the * to the left of L, and filling to the * to the
+right of F.
+	The correct way to handle this is actually very simple: place
+the x-coordinate update (x=x+dx) in just the right spot, either before
+or after the plot/fill routine.  In the above example, the left line
+should do the update after the plot/fill:
+	fill in-between and plot endpoints
+	x=x+dx
+The line will then always use the "previous" endpoint: it will start
+with F, and then with the last point of the previous line segment.
+The right line should do the updating before the plotting: it will then
+use the rightmost point of each segment, and when the end of the line
+is reached the next line will be computed, resetting the starting point.
+	Polygonamy put some wacky restrictions on DX and DY that just
+won't cut it here.  The lib3d routine can handle negative coordinates,
+and polygons which are partially off-screen.  The only restriction
+it puts on the numbers is that DX is 9-bits and DY is 8-bits; the
+coordinates X2 and Y2 etc. can be a full 16-bits, but the sides
+of the polygons can't exceed DX=9-bits and DY=8-bits.
+	In principle this allows you to draw polygons which are
+as large as the entire screen.  In practice it was perhaps not a
+great decision: if these polygons are being rotated, then a
+polygon with DX=320 will have DY=320 after a 90-degree rotation.
+So in a sense polygons _sides_ are actually limited to be no
+larger than 256 units.
+	The important word here though is "side".  A polygon
+can be much larger than 256x256 if it has more than one side!
+For example, a stopsign has eight sides, but each side is smaller
+than the distance from one end of the polygon to the other.
+Also, sides can always be split in half.  This restriction is
+just one of those design decisions...
+	Anyways, DX=9-bits and DY=8-bits.  To calculate this we need
+a divide routine.  Polygonamy used a routine involving logs and some
+fudged tables and other complexities.  Well most of that was pretty
+silly, because the divide is such a tiny portion of the whole polygon
+routine -- saving a few tens of cycles is awfully small potatoes in a
+routine that's using many thousands of cycles.
+	So the new strategy is to do the general routine: calculate
+log(DX) and log(DY), subtract, and take EXP() of the result to get
+an initial guess.  Then use a fast multiply to see how accurate
+the guess was, and if it is too large or too small then a few subtractions
+or additions can fix things up.  The remainder is also computed as
+a decimal number (remainder/DY times 256); the remainder is crucial,
+as accuracy is important -- if the lines are not drawn accurately,
+the polygon sides won't join together, and the polygon will look very
+weird.
+	Why not do a divide like in the projection routine?  Well,
+first of all, I wrote the polygon renderer before deriving that
+routine. :)  Second, that routine combines a multiply _and_ a
+divide into a single multiply; there is no multiply here, and if
+you count up the cycles I think you'll find that the logarithmic
+routine is faster, and with the remainder calculation it is simply
+better-suited for drawing polygons.
+	Although the remaining parts of the algorithm require a lot
+of blood and sweat to get the cycle count down, they are relatively
+straightforward.  I leave their explanation, along with a few other
+subtleties, for the detailed code disassemblies, below.
+Detailed Code Disassembly
+-------------------------
+	After all of those preliminary calculations and ideas and
+concepts we are finally in a position to write some decent code.
+The full lib3d source code is included in this issue, but it is worth
+examining the routines in more detail.  Five routines will be
+discussed below:
+	CALCMAT - Calculate a rotation matrix
+	ACCROTX - Accumulate the rotation matrix by a fixed amount
+	GLOBROT - Global rotate (rotate centers)
+	ROTPROJ - Local rotation (object points), and project if necessary
+	POLYFILL- Polygon routine
+Only the major portions of the routines will be highlighted.  And the
+first routine is...
+CALCMAT
+-------
+*
+* Calculate the local matrix
+*
+* Pass in: A,X,Y = angles around z,x,y axis
+*
+* Strategy: M = Ry Rx Rz where Rz=roll, Rx=pitch, Ry=yaw
+*
+CALCMAT calculates a rotation matrix using the old method: pass in the
+rotation amounts about the x y and z axis and calculate a matrix.
+There is a fairly complete explanation of this routine in the source
+code, but it has a very important feature: the order of the rotations.
+As the above comment indicates, it first takes care of roll (tilting
+your head sideways), then pitch (looking up and down), and finally
+yaw (looking left or right).  This routine can't be used for a general
+D world, but by doing rotations in this order it _can_ be used for
+certain kinds of motion, like motion in a plane (driving a car around),
+or an angle/elevation motion (like a gun turret).
+Note also the major difference from the accumulation routines: they
+rotate by a fixed amount.  Since this routine just uses the angles
+as parameters, the program that uses this routine can of course update
+the angles by any amount it likes.  That is: the accumulation routines
+can only turn left or right in 3 degree increments; this routine can
+instantly point in any direction.
+Next up are the accumulation routines.  These routines apply a fixed
+rotation to the rotation matrix; i.e. they rotate the world by a
+fixed amount about an axis.  The carry flag is used to indicate
+whether rotations are positive or negative (clockwise or counter-
+clockwise if you like).
+As was pointed out earlier, only one routine is needed to do the
+actual rotation calculation.  All it needs to know is which rows
+to operate on, so that is the job of ACCROTX ACCROTY and ACCROTZ.
+ACCROTX
+-------
+*
+* The next three procedures rotate the accumulation
+* matrix (multiply by Rx, Ry, or Rz).
+*
+* Carry clear means rotate by positive amount, clear
+* means rotate by negative amount.
+*
+ACCROTX  ENT
+The x-rotation just operates on rows 2 and 3; these routines just loop
+through the row elements, passing them to the main routine ROTXY.
+The new, rotated elements are then stored in the rotation matrix,
+and on exit the rotation matrix has accumulated a rotation about
+the x-axis.
+         ROR ROTFLAG
+         LDX #2
+:LOOP    STX COUNT
+         LDA D21REM,X     ;rows 2 and 3
+         STA AUXP
+         LDA G31REM,X
+         STA AUXP+1
+         LDY G31,X        ;.Y = row 3
+         LDA D21,X        ;.X = row 2
+         TAX
+         JSR ROTXY
+         LDX COUNT
+         LDA TEMPX
+         STA D21REM,X
+         LDA TEMPX+1
+         STA D21,X
+         LDA TEMPY
+         STA G31REM,X
+         LDA TEMPY+1
+         STA G31,X
+         DEX
+         BPL :LOOP
+         RTS
+*
+* Rotate .X,AUXP .Y,AUXP+1 -> TEMPX,+1 TEMPY,+1
+*
+* If flag is set for negative rotations, swap X and Y
+* and swap destinations (TEMPX TEMPY)
+*
+ROTXY
+This is the main accumulation routine.  It simply calculates
+x*cos(delta) + y*sin(delta), and y*cos(delta) - x*sin(delta).
+Since delta is so small, cos(delta) is very nearly 1 and sin(delta)
+is very nearly zero: an integer and a remainder part of x and y
+are necessary to properly calculate x*cos(delta) etc.
+Let x = xint + xrem (integer and remainder).  Then
+	x*cos(delta) = xint*cos(delta) + xrem*cos(delta).
+Since cos(delta) is nearly equal to 1, xint*cos(delta) is equal to
+xint plus a small correction; that is, xint*cos(delta) gives an
+integer part and a remainder part.  What about xrem*cos(delta)?  It
+also gives a small correction to xrem.  But xrem is already small!
+Which means the correction to xrem is really really small, i.e outside
+the range of our numbers.  So we can just discard the correction.
+This can generate a little bit of numerical error, but it is miniscule,
+and tends to get lost in the noise.
+The point of all this is that x*cos(delta) + y*sin(delta) is calculated
+as
+	xrem + xint*cos(delta) + yint*sin(delta)
+where xrem*cos(delta) is taken to be xrem, and yrem*sin(delta) is taken
+to be zero.  This introduces a very small error into the computation,
+but the errors tend to get lost in the wash.
+:CONT    LDA CDELREM,X
+         CLC
+         ADC SDELREM,Y
+         STA TEMPX
+         LDA CDEL,X       ;x*cos(delta)
+         ADC SDEL,Y       ;+y*sin(delta)
+         STA TEMPX+1
+         LDA TEMPX
+         CLC
+         ADC AUXP	  ;+xrem
+         STA TEMPX
+         BCC :NEXT
+         INC TEMPX+1
+:NEXT	 a similar piece of code to calculate y*cos - x*sin
+And that is the entire routine -- quite zippy, and at 14 bits per
+number it is remarkably accurate.
+GLOBROT
+-------
+Next up is GLOBROT.  The centers are 16-bit signed numbers, and
+the rotation matrix of course consists of 8-bit signed numbers.
+*
+* Perform a global rotation, i.e. rotate the centers
+* (16-bit signed value) by the rotation matrix.
+*
+* The multiplication multiplies to get a 24-bit result
+* and then divides the result by 64 (mult by 4).  To
+* perform the signed multiplication:
+*   - multiply C*y as normal
+*   - if y<0 then subtract 256*C
+*   - if C<0 then subtract 2^16*y
+*
+* Parameters: .Y = number of points to rotate
+*
+Recall that the object centers are 16-bit signed coordinates; therefore
+a full 16-bit signed multiply is needed.  Two macros are used for
+the multiplies:
+MULTAY   MAC              ;Multiply A*Y, store in var1, A
+which does the usual fast multiply, and
+QMULTAY  MAC              ;Assumes pointers already set up
+         LDA (MULTLO1),Y
+         SEC
+         SBC (MULTLO2),Y
+         STA ]1
+         LDA (MULTHI1),Y
+         SBC (MULTHI2),Y
+         <<<
+which does a "quick multiply", that is, it is the fast multiply
+without any of the zero-page setup muckety-muck.  As was pointed out
+in the theory section, once the zero page variables are set up,
+they stay set up!  ROTPROJ can take better advantage of this,
+but GLOBROT can also benefit.  Remember also that if A=0 this
+routine will fail; the routines below must check for A=0 ahead of
+time.
+Since the matrix elements are all "6-bit offset", that is, since
+they are the actual value times 64, the final result after the
+rotations needs to be divided by 64.  The easiest way to divide the
+-bit result by 64 is to shift *left* twice, and keep the upper bits.
+In the macro below, "C" is the "C"enter, and Mij is some element in
+the rotation matrix.
+* Fix sign and divide by 64
+DIVFIX   MAC              ;If Mij<0 subtract 256*C
+         LDY MULTLO1      ;If C<0 subtract 2^16*Mij
+         BPL POSM
+         STA TEMP3
+         LDA TEMP2
+         SEC
+         SBC ]1           ;Center
+         STA TEMP2
+         LDA TEMP3
+         SBC ]1+1         ;high byte
+POSM     LDY ]1+1
+         BPL DIV
+         SEC
+         SBC MULTLO1      ;Subtract Mij
+DIV      ASL TEMP1        ;Divide by 64
+         ROL TEMP2
+         ROL
+         ASL TEMP1
+         ROL TEMP2
+         ROL
+         LDY TEMP2
+         <<<              ;.A, .Y = final result
+* Main routine
+GLOBROT  ENT
+Recall that there are two ways to look at multiplying a matrix times
+a vector.  The usual way is "row times column"... and that is exactly
+how this routine works.  Why not do the "column times vector element"
+method, which we said offers this great computational advantage?
+Doing things that way means calculating
+	[A11] 	   [B12]      [C13]
+	[D21]*CX + [E22]*CY + [F23]*CZ
+	[G31]      [H32]      [I33]
+where the A11 etc. are the columns of the rotation matrix.  The
+argument was that CX could be set up in zero page once, and then
+multiplications could be done three at a time.  Why not just do that?
+Well, the most basic answer is that I had already written this
+routine before I noticed the multiplication trick.  So I could have
+rewritten it using the above method, or modify what was already in
+place.  Since CX CY and CZ are all 16-bit signed numbers, I chose
+to do the multiplication savings using A11 etc. as the zero
+page variable (A11*CX = A11*CXLO + 256*A11*CXHI), so the multiplications
+are done two at a time.  I think I also decided that the total cycle
+savings in rewriting the routine would have been miniscule compared
+with the total routine time, and that the whole global rotation time
+was small compared with the local rotations and especially the
+polygon plotting.
+In sum, doing a total rewrite didn't seem to be worth the effort.
+Nevertheless, this routine can certainly be made faster.
+Another item for improvement is to do the division by 64 only
+at the very end of the calculation (the routines below multiply,
+then divide by 64, then add; it would be better to multiply, add,
+then after all additions divide by 64).  Who knows what I was
+thinking at the time?!
+Oh well, that's why God invented version numbers :).
+Here is the main routine loop:
+GLOOP    DEY
+         STY COUNT
+	 [copy object centers to zero page for fast access]
+This part multiplies row 1 of the rotation matrix times the position
+vector, and stores the new result in CX, the x-coordinate of the object
+center.
+         LDA A11          ;Row1
+         LDX B12
+         LDY C13
+         JSR MULTROW      ;Returns result in .X .A = lo hi
+         LDY COUNT
+         STA (CXHI),Y
+         TXA
+         STA (CXLO),Y
+	 [Do the same thing for row 2 and the y-coordinate]
+	 [Do the same thing for row 3 and the z-coordinate]
+The routine then loops around back to GLOOP, until all points have
+been rotated.  The important work of this routine is done in MULTROW:
+*
+* Multiply a row by CX CY CZ
+* (A Procedure to save at least a LITTLE memory...)
+*
+M1       EQU CXSGN        ;CXSGN etc. no longer used.
+M2       EQU CYSGN
+M3       EQU CZSGN
+MULTROW
+         STA M1
+         STX M2
+         STY M3
+This next part checks to make sure A is not zero; as was pointed out
+in the discussion of the fast multiply, A=0 will fail (ask me how I
+know).
+         TAY
+         BEQ :SKIP1
+It is all pretty straightforward from here: compute M1*CX, adjust for
+signed values, and divide by 64 (this is the divide by 64 that it
+would have been better to leave for later).  Since zero-page is
+already set up, the second multiplication is done with a quick multiply.
+         LDY CX+1
+         >>> MULTAY,TEMP2
+         STA TEMP3
+         LDY CX
+         >>> QMULTAY,TEMP1
+         CLC
+         ADC TEMP2
+         STA TEMP2
+         LDA TEMP3
+         ADC #00
+         >>> DIVFIX,CX    ;Adjust result and /64
+:SKIP1   STY TM1
+         STA TM1+1
+The next two parts just calculate M2*CY and M3*CZ, adding to TM1 and
+TM1+1 as they go, and the routine exits.
+ROTPROJ
+-------
+Now we move on to perhaps THE core routine.  In general, *a whole lot*
+of object points need to be rotated and projected, so this routine
+needs to blaze.  Remember that there are two rotations in the
+"world" equation: local rotations to get an object pointing in the
+right direction, and another rotation to figure out what it looks
+like relative to us.  This routine can therefore just rotate (for
+the first kind of rotation), or rotate and project (for the second
+kind).  The idea is to pass in a list of points, tell it how many
+points to rotate (and project), and tell it where to store them all!
+*
+* ROTPROJ -- Perform local rotation and project points.
+*
+* Setup needs:
+*   Rotation matrix
+*   Pointer to math table (MATMULT = $AF-$B0)
+*   Pointers to point list (P0X-P0Z = $69-$6E)
+*   Pointers to final destinations (PLISTYLO ... = $BD...)
+*     (Same as used by POLYFILL)
+*   Pointers to lists of centers (CXLO CXHI... = $A3-$AE)
+*   .Y = Number of points to rotate (0..N-1)
+*   .X = Object center index (index to center of object)
+*
+* New addition:
+*   C set means rotate and project, but C clear means just
+*   rotate.  If C=0 then need pointers to rotation
+*   destinations ROTPX,ROTPY,ROTPZ=$59-$5E.
+*
+Recall that the rotation matrix is offset by 64 -- instead of ranging
+from -1..1, all elements range from -64..64.  As in the case of
+GLOBROT above, the final result needs to be divided by 64.  Also
+recall the result from the discussion of signed multiplication:
+if one set of numbers is between -64..64, and the other between
+-96..95, then all combinations a+b (and a-b) generate a unique
+-bit number.  In other words, only one table is needed.
+Next, think about rotations for a moment: rotations do not change
+lengths (if you rotate, say, a pencil, it doesn't magically get
+longer).  This has two important implications.  The first is
+that we know a priori that the result of this rotation will be an
+*8-bit* result, if the starting length was an 8-bit number.
+Thus we need only one table, and just two table lookups -- one
+for f(a+b) and the other for f(a-b).  Since the upper 8-bits
+are irrelevant, the divide by 64 can be incorporated directly
+into the table of f(x).
+The second implication is that the _length_ of the point-vectors
+cannot be more than 128.  This is because if we ignore the upper
+-bits, we don't capture the sign of the result.  Consider: each
+x, y, and z coordinate of a point can be between -96..95.  A point
+like (80,80,80) has length sqrt(3*80^2) = 138.56.  So if some rotation
+were to rotate that vector onto the x-axis it would have a new
+coordinate of (138,0,0).  Although this is an 8-bit number, it is
+not an eight-bit signed number -- the rotation could just as easily
+put the coordinate at (-138,0,0), which requires extra bits.  So some
+care must be taken to make sure that object points are not more than
+units away from the object center.
+As was pointed out earlier, we can save a lot of time with the
+matrix multiply by doing the multiplication as column-times-element,
+instead of the usual row-times-column: column 1 of the matrix
+times Px plus column 2 times Py plus column 3 times Pz, where
+(Px,Py,Pz) is the vector being rotated.  Here is the entire
+signed multiplication routine (with divide by 64):
+SMULT    MAC              ;Signed multiplication
+         STA MATMULT
+         EOR #$FF
+         CLC
+         ADC #01
+         STA AUXP
+         LDA (MATMULT),Y
+         SEC
+         SBC (AUXP),Y
+         <<<
+And this is the Quick Multiply, used after the zero-page variables
+have been set up:
+QMULT    MAC              ;Multiplication that assumes
+         LDA (MATMULT),Y  ;pointers are already initialized
+         SEC
+         SBC (AUXP),Y
+         <<<
+Pretty cool, eh?  A 12-cycle signed multiply and divide. :)
+On to the routine:
+ROTLOOP is the main loop.  It first rotates the points.  If the
+carry was set then it adds the points to the object center and
+projects them.
+ROTLOOP  LDY COUNT
+         BEQ UPRTS
+         DEY
+         STY COUNT
+         LDA (P0X),Y
+         BNE :C1
+         STA TEMPX
+         STA TEMPY
+         STA TEMPZ
+         BEQ :C2
+:C1      LDY A11          ;Column 1
+         >>> SMULT
+         STA TEMPX
+         LDY D21
+         >>> QMULT
+         STA TEMPY
+         LDY G31
+         >>> QMULT
+         STA TEMPZ
+The above multiplies the x-coordinate (P0X) times the first column of
+the rotation matrix.  Note the check for P0X=0, which would otherwise
+cause the multiplication to fail.  Note that there is one SMULT,
+which sets up the zero-page pointers, followed by two QMULTS.
+The result in TEMPX, TEMPY, TEMPZ will be added to subsequent
+column multiplies.  And that's the whole routine!
+After the third column has been multiplied, we have the rotated
+point.  If projections are taking place, then the point needs to
+be added to the center and projected.  The centers are 16-bit
+signed coordinates, but so far the points are just 8-bit signed
+coordinates.  To make them into 16-bit signed coordinates we
+need to add a high byte of either 00, for positive numbers, or
+$FF, for negative numbers (e.g. $FFFE = -2).  In the code below,
+.Y is used to hold the high byte.
+ADDC     LDY #00
+         CLC
+         ADC TEMPZ
+         BPL :ROTCHK
+         DEY              ;Sign
+:ROTCHK  LDX ROTFLAG
+         BMI :POS1
+         LDY COUNT        ;If just rotating, then just
+         STA (ROTP0Z),Y   ;store!
+         LDA TEMPY
+         STA (ROTP0Y),Y
+         LDA TEMPX
+         STA (ROTP0X),Y
+         JMP ROTLOOP
+:POS1    CLC              ;Add in centers
+         ADC CZ
+         STA TEMPZ
+         TYA
+         ADC CZ+1
+         STA TEMPZ+1      ;Assume this is positive!
+Remember that only objects which are in front of us -- which have
+positive z-coordinates -- are visible.  The projection won't
+work right if any z-coordinates are negative.  Thus the above
+piece of code doesn't care about the sign of the z-coordinate.
+Two similar pieces of code add the x/y coordinate to the x/y center
+coordinate.  The signs of the result are stored in CXSGN and CYSGN,
+for use below.
+Recall how projection works: accuracy for small values of Z, but
+speed for high values of Z.  If the z-coordinate is larger than
+-bits, we just shift all the coordinates right until z is 8-bits.
+Since the X and Y coordinates might be negative, we need the
+sign bits CXSGN and CYSGN; they contain either 00 or $FF, so
+that the rotations will come out correctly (negative numbers will
+stay negative!).
+         LDA TEMPZ+1
+         BEQ PROJ
+:BLAH    LSR              ;Shift everything until
+         ROR TEMPZ        ;Z=8-bits
+         LSR CXSGN
+         ROR TEMPX+1      ;Projection thus loses accuracy
+         ROR TEMPX        ;for far-away objects.
+         LSR CYSGN
+         ROR TEMPY+1      ;(Big whoop)
+         ROR TEMPY
+         TAX
+         BNE :BLAH
+Finally comes the projection itself.  It just follows the equations
+set out earlier, in Section 3.  PROJTAB is the table of d/z.
+It turns out that there are a lot of zeros and ones in the
+table, so those two possibilities are treated as special cases.
+The PROJTAB value is used in the zero-page multiplication pointers,
+to again make multiplication faster.  By calculating both the
+x- and y-coordinates simultaneously, the pointers may be reused
+even further.
+After multiplying, the coordinates are added to the screen offsets
+(XOFFSET and YOFFSET), so that 0,0 is at the center of the screen
+(i.e. XOFFSET=160 and YOFFSET=100).  These values are changeable,
+so the screen center may be relocated almost anywhere.  The final
+coordinates are 16-bit signed values.
+Finally, there is the other core routine: POLYFILL, the polygon
+renderer.
+POLYFILL
+--------
+POLYFILL works much like the old Polygonamy routine.  Each object
+consists of a list of points.  Each face of the object is some
+subset of those points.  POLYFILL polygons are defined by a set
+of indices into the object point list -- better to copy an 8-bit
+index than a 16-bit point.  The index list must go counter-clockwise
+around the polygon, so the hidden-face routine will work correctly.
+To draw the polygon it simply starts at the bottom, calculating
+the left and right edges and filling in-between.
+POLYFILL also greatly improves upon the old routine.  It can draw
+to bitmaps in any bank.  It is very compact.  It deals with
+-bit signed coordinates, and can draw polygons which are partially
+(or even wholly) off-screen.  It also correctly draws polygons which
+overlap (in a very sneaky way).
+Keep in mind that this routine needs to be just balls-busting fast.
+A typical object might have anywhere from 3-10 visible polygons, and
+a typical polygon might be 100-200 lines high.  Just saving a few
+tens of cycles in the main loop can translate to tens of thousands
+of cycles saved overall.  By the same token, there's no need to
+knock ourselves out on routines which aren't part of the main loop;
+saving 100 cycles overall is chump change.  Finally, since this is
+a library, there are some memory constraints, and some of the routines
+are subsequently less efficient than they would otherwise be.  Again,
+though, those few added cycles are all lost in the noise when compared
+with the whole routine.
+There is a whole a lot of code, so it's time to just dive in.  First,
+*
+* Some tables
+*
+XCOLUMN                   ;xcolumn(x)=x/8
+]BLAH    = 0
+         LUP 32
+         DFB ]BLAH,]BLAH,]BLAH,]BLAH
+         DFB ]BLAH,]BLAH,]BLAH,]BLAH
+]BLAH    = ]BLAH+1
+         --^
+* Below are EOR #$FF for merging into the bitmap, instead
+* of just ORAing into the bitmap.
+LBITP                     ;Left bit patterns
+         LUP 32
+* DFB $FF,$7F,$3F,$1F,$0F,$07,$03,$01
+         DFB 00,$80,$C0,$E0,$F0,$F8,$FC,$FE
+         --^
+RBITP                     ;right bit patterns
+         LUP 32
+* DFB $80,$C0,$E0,$F0,$F8,$FC,$FE,$FF
+         DFB $7F,$3F,$1F,$0F,$07,$03,$01,$00
+         --^
+I will put off discussion of the above tables until they are
+actually used.  It is just helpful to keep them in mind, for later.
+Next up is the fill routine.  The polygonamy fill routine looked like
+	STA BITMAP,Y
+	STA BITMAP+8,Y
+	STA BITMAP+16,Y
+	...
+There were a total of 25 of these -- one for each row.  Then there
+were two bitmaps, so 50 total.  Each bitmap-blaster was page-aligned.
+Fills take place between two columns on the screen.  By entering
+the above routine at the appropriate STA xxxx,Y, the routine
+could begin filling from any column.  By plopping an RTS onto
+the appropriate STA the routine could exit on any column.  After
+filling between the left and right columns, the RTS was restored
+to an STA xxxx,Y.
+Although the filling is very fast, the routine is very large, locked
+to a specific bitmap, and a bit cumbersome in terms of overhead.  So
+a new routine is needed, and here it is:
+*
+* The fill routine.  It just blasts into the bitmap,
+* with self-modifying code determining the entry and
+* exit points.
+*
+FILLMAIN
+]BLAH    = 0              ;This assembles to
+         LUP 32           ;LDY #00  STA (BPOINT),Y
+         LDY #]BLAH       ;LDY #08  STA (BPOINT),Y
+         STA (BPOINT),Y   ;...
+]BLAH    = ]BLAH+8        ;LDY #248 STA (BPOINT),Y
+         --^
+         INC BPOINT+1     ;Advance pointer to column 32
+COL32    LDY #00
+         STA (BPOINT),Y
+         LDY #08          ;33
+         STA (BPOINT),Y
+         LDY #16
+         STA (BPOINT),Y
+         LDY #24
+         STA (BPOINT),Y
+         LDY #32
+         STA (BPOINT),Y
+         LDY #40          ;37
+         STA (BPOINT),Y
+         LDY #48
+         STA (BPOINT),Y
+         LDY #56          ;Column 39
+         STA (BPOINT),Y
+FILLEND  RTS              ;64+256=320
+FILL
+         JMP FILLMAIN     ;166 bytes
+As before, self-modifying code is used to determine the entry and
+exit points.  As you can see, compared with the polygonamy routine
+this method takes 3 extra cycles per column filled.  But now there
+is just one routine, and it can draw to any bitmap.  As an added
+bonus, if we do the RTS just right, then (BPOINT),Y will point to
+the ending-column (also note the INC BPOINT+1 in the fill code, for
+when column 32 is crossed).
+There are two parts to doing the filling.  The above fill routine
+fills 8 bits at a time.  But the left and right endpoints need to
+be handled specially.  The old routine had to recalculate those
+endpoints; this method gets it automatically, and so saves some
+extra cycles in overhead.
+In the very worst case, the old routine takes 200 cycles to fill
+columns; the new routine takes 325 cycles.  The main routine
+below takes around 224 cycles per loop iteration.  Ignoring the
+savings in overhead (a few tens of cycles), this means that in
+the absolute worst case the old method will be around 20% faster.
+For typical polygons, they become comparable (5% or so).  With all
+the advantages the new routine offers, this is quite acceptable!
+A few extra tables follow:
+LOBROW   DFB 00,64,128,192,00,64,128,192
+	 (and so on)...
+HIBROW
+         DFB 00,01,02,03,05,06,07,08
+	 etc.
+COLTAB                    ;coltab(x)=x*8
+         DFB 0,8,16,24,32,40,48,56,64,72,80
+	 etc.
+LOBROW and HIBROW are used to calculate the address of a row in
+the bitmap, by adding to the bitmap base address.  COLTAB then gives
+the offest for a given column.  Thus the address of any row,column
+is quickly calculated with these tables.
+Finally, the next two tables are used to calculate the entry and exit
+points into the fill routine.
+*
+* Table of entry point offsets into fill routine.
+* The idea is to enter on an LDY
+*
+FILLENT
+         DFB 0,4,8,12,16,20,24,28,32,36,40 ;0-10
+         DFB 44,48,52,56,60,64,68,72,76,80 ;11-20
+         DFB 84,88,92,96,100,104,108,112   ;21-28
+         DFB 116,120,124,128               ;29-32
+         DFB 134          ;Skip over INC BPOINT+1
+         DFB 138,142,146,150,154,158 ;34-39
+         DFB 162          ;Remember that we use FILLENT+1
+*
+* Table of RTS points in fill routine.
+* The idea is to rts after the NEXT LDY #xx so as to
+* get the correct pointer to the rightmost column.
+*
+* Thus, these entries are just the above +2
+*
+FILLRTS
+         DFB 2,6,10,14,18,22,26,30,34,38,42 ;0-10
+         DFB 46,50,54,58,62,66,70,74,78,82  ;11-20
+         DFB 86,90,94,98,102,106,110,114,118,122,126
+         DFB 132          ;Skip over INC
+         DFB 136,140,144,148,152,156,160    ;33-39
+All that's left now is the main code routines.  Before drawing the
+polygon there is some setup stuff that needs to be done.  The
+lowest point needs to be found, and the hidden face check needs to
+be done.  The hidden face check is simply: if we can't draw this
+polygon, then it is hidden!
+As was said before, the list of point indices moves around the
+polygon counter-clockwise when the polygon is visible.  When the
+polygon gets turned around, these points will go around the
+polygon clockwise.  What does this mean computationally?  It
+means that the right side of the polygon will be to the LEFT of
+the left side, on the screen.  A simple example:
+         ____
+         \  /
+       L  \/  R
+When the above polygon faces the other way (into the monitor), R will
+be to the left of L.  The polygon routine always computes R as the
+"right side" and L as the "left side" of the polygon; if, after taking
+a few steps, the right side is to the left of the left side, then we
+know that the polygon is turned around.  The hidden face calculation
+just computes the slopes of the left and right lines -- which we need
+to do to draw the polygon anyways -- and takes a single step along
+each line.  All that is needed is to compare the left and right points
+(or the slopes), and either draw the polygon or punt.
+There are two sides to be dealt with: the left and right.  Each of
+those sides can have either a positive or negative slope, therefore
+a total of four routines are needed.  A second set of four routines
+is used for parts of the polygon that have y-coordinates larger than
+or less than zero, i.e. off of the visible screen; these routines
+calculate the left and right sides as normal, but skip the plotting
+calculations.
+All four routines are similar.  The differences in slope change
+some operations from addition to subtraction, and as was explained
+earlier affects whether the x=x+dx update comes before or after
+the plotting.  It also affects the very last point plotted, similar
+to the way the x=x+dx update is affected.  The truly motivated can
+figure out these differences, so I'll just go through one routine
+in detail:
+* Left positive, right negative
+*
+* End: left and right advance normally
+DYL3     >>> DYLINEL,UL3
+LEFTPM   >>> LINELP,LEFTMM
+         JMP LCOLPM
+LOOP3    DEC YLEFT
+         BEQ DYL3
+UL3      >>> PUPDATE,LEFTXLO;LEFTXHI;LEFTREM;LDY;LDXINT;LDXREM
+LCOLPM   >>> LCOLUMN
+         DEC YRIGHT
+         BNE UR3
+         >>> DYLINER,UR3
+RIGHTPM  >>> LINERM,RIGHTPP
+         JMP RCOLPM
+UR3      >>> MUPDATE,RIGHTXLO;RIGHTXHI;RIGHTREM;RDY;RDXINT;RDXREM
+RCOLPM   >>> RCOLUMN
+         >>> UPYRS,LOOP3
+That's the whole routine, for a polygon which ends in a little
+peak, like /\ i.e. left line positive slope and right line negative.
+Let's read the main loop:
+LOOP3	Decrease the number of remaining points to draw in the left line;
+	  if the line is finished, then calculate the next line.
+	PUPDATE: The "P" is for "Plus": compute x=x+dx for the left side.
+	LCOLUMN: Calculate the left endpoint on the screen, and set
+	         up the screen pointer and the fill entry point.
+	Decrease the number of remaining points in the right line;
+	  if finished, calculate the next line segment (note BNE).
+	MUPDATE: "M"inus update, since slope is negative: calculate x=x-dx
+	RCOLUMN: Calculate the right endpoint and fill column,
+		 do the fill, and plot the left/right endpoints.
+	UPYRS: Update the bitmap pointer, and go back to LOOP3
+So it's just what's been said all along: calculate the left and right
+endpoints, and fill.  When a line is all done, the next side of
+the polygon is computed.  DYL3 performs the computations for the
+left line segment.  DYLINEL computes the value of DY.  Note that
+DY always has the same sign, since the polygon is always drawn from
+the bottom-up.  LINELP computes DX and DX/DY, the inverse slope.
+If DX is negative it will jump to LEFTMM.  Since we are in the
+"plus-minus" routine, i.e. left side = plus slope, right side = minus
+slope, then it needs to jump to the "minus-minus" routine if DX
+has the wrong sign.  LEFTMM stands for "left side, minus-minus",
+in just the same way that LEFTPM in the above code means "left side,
+plus-minus".  The right line segment is similar.
+	Note what happens next: the routine JMPs to LCOLPM.  It skips
+over the PUPDATE at UL3.  Similarly, the right line calculation JMPs
+over UR3 into RCOLPM.  What this does is delay the calculation of
+x=x+dx or x=x-dx until *after* the plotting is done, to fill between
+the correct left and right endpoints.  See the discussion of polygon
+fills in Section 3 for a more detailed explanation of what is going
+on here.
+	Now let's go through the actual macros which make up
+the above routine.  First, the routines to calculate the slope of
+the left line segment:
+*
+* Part 1: Compute dy
+*
+* Since the very last point must be plotted in a special
+* way (else not be plotted at all), ]1=address of routine
+* to handle last point.
+*
+DYLINEL  MAC              ;Line with left points
+BEGIN    LDX LINDEX
+L1       DEC REMPTS       ;Remaining lines to plot
+         BPL L2           ;Zero is still plotted
+         INC YLEFT
+         LDA REMPTS
+         CMP #$FF         ;Last point
+         BCS ]1
+EXIT     RTS
+The above code first checks to see if any points are remaining.
+If this is the last point, we still need to fill in the very last
+points.  If the right-hand-side routine has already dealt with
+the last endpoint, REMPTS will equal $FF and the routine will really
+exit; in this case, the last part has already been plotted.
+L3       LDX NUMPTS
+L2       LDY PQ,X
+         STY TEMP1
+         LDA (PLISTYLO),Y ;Remember, y -decreases-
+         DEX
+         BMI L3           ;Loop around if needed
+         LDY PQ,X
+         STY TEMP2
+         SEC
+         SBC (PLISTYLO),Y
+         BEQ L1           ;If DY=0 then skip to next point
+Next, it reads points out of the point queue (PQ) -- the list of point
+indices going around the polygon.  X contains LINDEX, the index
+of the current point on the left-hand side.  We then decrease this
+index to get the next coordinate on the left-hand side; this index
+is increased for right-hand side coordinates.  That is, left lines
+go clockwise through the point list, and right lines go counter-
+clockwise.  If DY=0, then we can just skip to the next point in
+the point list -- the fill routine is very good at drawing horizontal
+lines.
+	Sharp-eyed readers may have noticed that only the low-bytes
+of the y-coordinates are used.  We know ahead of time that DY is
+limited to an eight-bit number, and since we are always moving up
+the polygon we know that DY always has the same sign.  In principle,
+then, we don't need the high bytes at all.  In practice, though,
+the points can get screwed up (like if the polygon is very very
+tiny, and one of the y-coordinates gets rounded down).  The JSR FLOWCHK
+below checks the high byte.
+         STA LDY
+         STA YLEFT
+         STA DIVY
+         STX LINDEX
+         JSR FLOWCHK      ;Just in case, check for dy < 0
+         BMI EXIT         ;(sometimes points get screwed up)
+         <<<
+The code for FLOWCHK is simply
+FLOWCHK
+         LDY TEMP1
+         LDA (PLISTYHI),Y
+         LDY TEMP2
+         SBC (PLISTYHI),Y
+         RTS
+Why put such a tiny thing in a subroutine?  Because I was out of
+memory.  It was very annoying.  Remember the rule though: a few
+occasional wasted cycles are a drop in the bucket.  And, of course,
+this is why God invented version numbers.  (It can be made more
+efficient, anyways -- as you might have guessed, this was kludged
+in at the last moment, long after the routine had been written).
+Okee dokee, the next part of computing the line slope is to compute DX.
+For this routine, the left line has positive slope.
+*
+* Part 2: Compute dx.  If dx has negative the expected
+* sign then jump to the complementary routine.
+*
+* dx>0 means forwards point is to the right of the
+* current point, and vice-versa.
+*
+LINELP   MAC              ;Left line, dx>0
+         LDY TEMP2
+         LDA (PLISTXLO),Y ;Next point
+         LDY TEMP1
+         SEC              ;Carry can be clear if jumped to
+         SBC (PLISTXLO),Y ;Current point
+         STA DIVXLO
+         LDY TEMP2
+         LDA (PLISTXHI),Y
+         LDY TEMP1
+         SBC (PLISTXHI),Y
+         BPL CONT1        ;If dx<0 then jump out
+         JMP ]1           ;Entry address
+CONT1    STA DIVXHI
+         LDA (PLISTXLO),Y ;Current point
+         STA LEFTXLO
+         LDA (PLISTXHI),Y
+         STA LEFTXHI
+         JSR DIVXY        ;Returns int,rem = .X,.A
+         JSR LINELP2
+DONE2    <<<              ;Now .X=low byte, .A=high byte
+                          ;of current point
+Pretty short.  It first computes DX, and jumps to the complementary
+routine if DX has the wrong sign.  Recall that DX can be 9-bits,
+so both the low and high bytes of PLISTX are used.  The current
+line coordinate is stored in LEFTXLO/LEFTXHI, and DIVXY thencomputes
+DX/DY, both the integer part and remainder part in fixed 16-bit format.
+That is, the number returned is xxxxxxxx.xxxxxxxx i.e. 8 bit integer,
+bit remainder (256*rem/DY).
+	A few brief words should be said about DIVXY.  It uses a
+table of logs to compute an estimate for DX/DY.  It then multiplies
+that estimate by DY, and fixes up the estimate if it is too large
+or too small.  At this point, it has the integer part N and the
+remainder R, i.e. DX/DY = N + R/DY.  To compute R/DY it just uses
+the log tables again but with a different exponential table,
+since LOG(R) - LOG(DY) will always be *negative*.  It then uses
+this estimate as the actual remainder -- it doesn't try to correct
+with a multiplication.  By my calculations this can indeed cause a
+tiny little error, but only for very very special cases (extremely
+large lines/polygons), and it is something that is difficult to even
+notice.  I am always happy to trade tiny errors for extra cycles.
+	After DIVXY is all done the subroutine LINELP2 is used --
+again, it was moved out to conserve memory.  Recall that the first
+step needs to be of size dx/2.  LINELP2 and siblings perform this
+computation:
+LINELP2
+         STX LDXINT
+         CMP #$FF         ;if dy=1
+         BNE :NOT1        ;then A=dx/2
+         LDA #00
+         ROR              ;Get carry bit from dx/2
+         STA LDXREM
+         LDA #$80
+         STA LEFTREM
+         LDX LEFTXLO
+         LDA LEFTXHI
+         BCC :RTS         ;And start from current point
+DIVXY sets .A to #$FF if DY=1 -- although you'd expect that $FF is
+a perfectly valid remainder, it turns out that it doesn't happen
+because of the way DIVXY computes remainders.  If DY=1, then DX/DY is
+exactly DX.  More importantly, this is the only case in which DX/DY
+might be 9-bits; any other DY will return an 8-bit value of DX/DY.
+For this reason it is handled specially.
+:NOT1    STA LDXREM
+         LDA #$80
+         STA LEFTREM      ;Initialize remainder to 1/2
+         TXA              ;x=x-dxdy/2
+         LSR
+         STA TEMP2
+         LDA LEFTXLO
+         TAX              ;.X = current point
+         SEC
+         SBC TEMP2
+         STA LEFTXLO
+         LDA LEFTXHI
+         BCS :DONE
+         DEC LEFTXHI
+         TAY              ;Set/clear Z flag
+:DONE    CLC
+:RTS     RTS
+The above probably looks a little strange.  The slope is positive,
+so we want to calculate x=x+dx/2, but the above calculates x=x-dx/2.
+The reason is that the main loop will add dx to it, so that the
+next iteration will take it out to +dx/2.  Why do it on the next
+iteration?  For the same reason the x=x+dx update is delayed until
+after the plotting: for left lines with positive slope, we always
+need to start at the left endpoint of each horizontal line segment;
+in this case, that left endpoint is exactly the starting point.
+Notice that before computing x-dx/2 the above routine puts
+LEFTXLO/LEFTXI -- the starting point -- in .X/.A.  The plot/fill
+calculations are done using the point in .X/.A and so the plot
+will be done from the starting point.
+The above are the routines to calculate the slope of the line.
+They then jump over the next part: the part of the main loop
+which calculates x=x+dx.
+*
+* Next, the parts which update the X coordinates
+* for positive and negative dx (in real space, of
+* course -- the stored value of dx is always positive)
+*
+* Parameters passed in are:
+*   ]1 = lo byte x coord
+*   ]2 = high byte x coord
+*   ]3 = remainder
+*   ]4 = dy
+*   ]5 = dx/dy, integer
+*   ]6 = dx/dy, remainder
+*
+PUPDATE  MAC              ;dx>0
+         LDA ]3
+         CLC
+         ADC ]6
+         STA ]3
+         LDA ]1           ;x=x+dx/dy
+         ADC ]5
+         TAX
+         STA ]1
+         BCC CONT
+         INC ]2           ;High byte, x
+         CLC
+CONT     LDA ]2
+         <<<              ;Carry is CLEAR on exit
+                          ;.X = lo byte xcoord
+                          ;.A = hi byte
+As you can see, it is a very simple 16-bit addition.  As it says,
+it leaves carry clear and .X/.A = lo/hi for the column calculation:
+*
+* Compute the columns and plot the endpoints
+*
+* .X contains the low byte of the x-coord
+* .A was JUST loaded with the high byte
+*
+* Carry is assumed to be CLEAR
+*
+LCOLUMN  MAC
+* LDA LEFTXHI
+         BEQ CONT
+         CMP #2
+         BCC CONT1
+         ASL
+         BCS NEG
+POS      LDA #$FF         ;Column flag
+         STA LCOL
+         BNE DONE
+NEG      LDA #00          ;If negative, column=0
+         STA LCOL
+         STA LBITS        ;(will be EOR #$FF'ed)
+         LDA #4           ;Start filling at column 1
+         STA FILL+1
+         BNE DONE
+	The routine first checks if the coordinate lies in one of the
+columns on the screen, by checking the high byte.  If the high byte
+is larger than one then the coordinate is way off the right side of the
+screen. If the left coordinate is past the right edge of the screen then
+there is nothing on-screen to fill, so LCOL is set to $FF as a flag.  If
+the coordinate is negative then LCOL, the left column number, is set to 0,
+which will flag the plotting routine later on.  LBITS will be explained
+later, and FILL is the entry point for the fill routine, consisting
+of a JMP xxxx instruction.  Setting FILL+1 to 4 starts the fill routine
+at column one.  Why not start at column zero, and let the fill routine
+take care of it?  Because that requires a flag, and hence a check of
+that flag in the plotting routine.  That flag check means a few extra
+cycles on each loop iteration, and actually screws up the plotting
+logic.  For the relatively rare event of negative columns, this is
+totally not worth it.
+	There is still another branch above -- if the high byte is
+equal to one.  If it is, then we need to check if the x-coordinate
+is less than 320.  If it is, then the bitmap pointer and column
+entry point need to be set up correctly.
+CONT1    CPX #64          ;Check X<320
+         BCS POS
+         LDA #32          ;Add 32 columns!
+         INC BPOINT+1     ;and advance pointer
+CONT     ADC XCOLUMN,X
+         STA LCOL
+         TAY
+         LDA LBITP,X
+         STA LBITS        ;For later use in RCOLUMN
+         LDA FILLENT+1,Y  ;Get fill entry address
+         STA FILL+1
+DONE     <<<
+By using the ADC XCOLUMN,X instead of the simple LDA XCOLUMN,X we can
+just add the extra 32 columns in very easily.  Remember that the
+routine is entered with C clear, and A=0 if X<256.  This gives the
+column number, 0-39 (the XCOLUMN table is way up above).  A table
+lookup is cheaper than dividing by eight.  LBITP is a table which
+gives the bit patterns for use in plotting the endpoints -- the part
+of the line not drawn by the fill routine.  Plotting this endpoint
+is put off until later, because, among other things, the left and
+right endpoints might share a column (at the tips of the polygon, or
+for a very skinny polygon); plotting now would hose other things
+on the screen.  Finally, FILLENT is used to set the correct entry
+point into the fill routine, FILL.
+	The right side of the polygon uses a similar set of routines
+to calculate slopes and do the line updating.  The column calculation
+is similar at first, but does the actual plotting and filling:
+*
+* Note that RCOLUMN does the actual filling
+*
+* On entry, .X = lo byte x-coord, and .A was JUST loaded
+*   with the high byte.
+*
+* Carry must be CLEAR.
+*
+RCOLUMN  MAC
+* LDA RIGHTXHI
+         BEQ CONT
+         CMP #2
+         BCC CONT1
+         ASL
+         BCS JDONE        ;Skip plot+fill if x<0
+POS      LDX LCOL
+         BMI JDONE        ;Skip if lcol>40!
+         LDY YCOUNT       ;Plot the left edge
+         LDA (FILLPAT),Y
+         STA TEMP1
+         LDY COLTAB,X     ;Get column index
+         EOR (BPOINT),Y   ;Merge into the bitmap
+         AND LBITS        ;bits EOR #$FF
+         EOR TEMP1        ;voila!
+         STA (BPOINT),Y
+         LDA TEMP1
+         JSR FILL         ;Just fill to last column
+JDONE    JMP DONE
+If the right column is negative, then the polygon is totally off the
+left side of the screen.  If the right coordinate is off the right side
+of the screen, and the left column is still on the screen, then we
+just fill all the way to the edge.  The left endpoints are then merged
+into the bitmap -- merging will be discussed, below.
+UNDER    LDY LCOL         ;It is possible that, due to
+         BMI DONE         ;rounding, etc. the left column
+         LDX #00          ;got ahead of the right column
+                          ;x=0 will force load of $FF
+SAME     LDA RBITP,X      ;If zero, they share column
+         EOR LBITS
+         STA LBITS
+         LDY YCOUNT
+         LDA (FILLPAT),Y
+         STA TEMP1
+         LDX LCOL
+         LDY COLTAB,X
+         EOR (BPOINT),Y   ;Merge
+         AND LBITS
+         EOR TEMP1
+         STA (BPOINT),Y   ;and plot
+         JMP DONE
+CONT1    CPX #64          ;Check for X>319
+         BCS POS
+         LDA #32          ;Add 32 columns!
+CONT     ADC XCOLUMN,X
+         CMP LCOL         ;Make sure we want to fill
+         BCC UNDER
+         BEQ SAME
+After the right column is computed it is compared to the left column.
+If they share a column (or if things got screwed up and the right column
+is to the left of the left column), then the left and right sides
+need to be combined together and merged into the bitmap.  Again,
+merging is described below.  Otherwise, it progresses normally:
+         LDY RBITP,X
+         STY TEMP1
+         LDX LCOL
+         STA LCOL         ;Right column
+         LDY YCOUNT       ;Plot the left edge
+         LDA (FILLPAT),Y
+         STA TEMP2
+         LDY COLTAB,X     ;Get column index
+         EOR (BPOINT),Y   ;Merge into bitmap
+         AND LBITS
+         EOR TEMP2
+         STA (BPOINT),Y
+it uses LCOL as temporary storage for the right column, sets the fill
+pattern, and then plots the left edge.  Recall that LCOLUMN set up
+the bitmap pointer, BPOINT, if the x-coord was larger than 255.
+COLTAB gives the offset of a column within the bitmap, i.e. column*8.
+The pattern is loaded and the left endpoint is then merged into the
+bitmap.
+	What, exactly, is a 'merge'?  Consider a line with LEFTX=5,
+i.e. that starts in the middle of a column.  The left line calculation
+assumes that the left edge will be 00000111, and the fill takes care
+of everything to the right of that column.   We can't just STA into
+the bitmap, because that would be bad news for overlapping or adjacent
+polygons.  We also don't want to ORA -- remember that this is a pattern
+fill, and if that pattern has holes in it then this polygon will
+combine with whatever is behind it, and can again lead to poor results.
+What we really want to do is to have the pattern stored to the screen,
+without hosing nearby parts of the screen.  A very crude and cumbersome
+way of doing this would be to do something like
+	LDA BITP	;00000111
+	EOR #$FF
+	AND (BITMAP)	;mask off the screen
+	STA blah
+	LDA BITP
+	AND pattern
+	ORA blah
+	STA (BITMAP)
+That might be OK for some lame-o PC coder, but we're a little smarter
+than that I think.  Surely this can be made more efficient?  Of course
+it can, and stop calling me Shirley.  It just requires a little bit
+of thinking, and the result is pretty nifty.
+	First we need to specify what the problem is.  There are
+three elements: the BITP endpoint bits, the FILLPAT fill pattern bits,
+and the BPOINT bitmap screen bits.  What we need is a function which
+does
+	bitmask   pattern   bitmap   output
+	-------   -------   ------   ------
+         y        x        x
+         y        x        y
+In the above, 'y' represents a bit in the fill pattern and 'x' represents
+a bit on the screen.  They might have values 0 or 1.  If the bitmask
+(i.e. 00000111) has a 0 in it, then the bitmap bit should pass through.
+If the bitmask has a 1, then the *pattern* bit should pass through.
+A function which does this is
+	(pattern EOR bitmap) AND (NOT bitmask) EOR pattern
+This first tells us that the bitmask should be 11111000 instead of
+00000111.  This method only involves one bitmap access, and doesn't
+involve any temporary variables.  In short, it merges one bit pattern
+into another, using a mask to tell which bits to change.  So let's
+look at that bitmap merge code again:
+	 LDY YCOUNT
+         LDA (FILLPAT),Y
+         STA TEMP2
+         LDY COLTAB,X     ;Get column index
+         EOR (BPOINT),Y   ;Merge into bitmap
+         AND LBITS
+         EOR TEMP2
+         STA (BPOINT),Y
+And there it is.  Pattern EOR bitmap AND not bitmask EOR pattern.
+Note that because the pattern is loaded first, .Y can be reused as
+an index into the bitmap, and X can stay right where it is (to be
+again used shortly).  With these considerations, you can see that
+trying to mask off the bitmap would be very cumbersome indeed!
+	Note that if the left and right endpoints share a column,
+their two bitmasks need to be combined before performing the merge.
+This just means EORing the masks together -- remember that they
+are inverted, so AND won't work, and EOR is used instead of ORA
+in case bits overlap (try a few examples to get a feel for this).
+	Okay, let's remember where we are: the columns have all been
+computed, the fill entry points are computed, and the left endpoint
+has been plotted.  We still have to do the fill and plot the right
+endpoint.  LCOLUMN set the fill entry point, but RCOLUMN needs to
+set the exit point, by sticking an RTS at the right spot.
+         LDY LCOL         ;right column
+         LDX FILLRTS,Y    ;fill terminator
+         LDA #$60         ;RTS
+         STA FILLMAIN,X
+         LDA TEMP2        ;Fill pattern
+         JSR FILL
+         EOR (BPOINT),Y
+         AND TEMP1        ;Right bits
+         EOR TEMP2        ;Merge
+         STA (BPOINT),Y
+         LDA #$91         ;STA (),Y
+         STA FILLMAIN,X   ;Fill leaves .X unchanged
+                          ;And leaves .Y with correct offset
+DONE     <<<
+The fill routine doesn't change X, leaves (BPOINT),Y pointing to the
+last column, and leaves the fill pattern in A.  The right side of the
+line can then be immediately plotted after filling, and the STA (),Y
+instruction can be immediately restored in the fill routine.
+And that's the whole routine!
+	After plotting, the loop just takes a step up the screen and
+loops around.  Since BPOINT, the bitmap pointer, may have been changed
+(by LCOLUMN, or by the fill routine), the high byte needs to be restored.
+*
+* Finally, we need to decrease the Y coordinate and
+* update the pointer if necessary.
+*
+* Also, since BPOINT may have been altered, we need
+* to restore the high byte.
+*
+* ]1 = loop address
+*
+UPYRS    MAC
+         LDA ROWHIGH
+         STA BPOINT+1
+         DEC BPOINT
+         DEC YCOUNT
+         BMI FIXIT
+         JMP ]1
+FIXIT    LDA #7
+         STA YCOUNT
+         LDY YROW
+         BEQ EXIT         ;If y<0 then exit
+         DEY
+         STY YROW
+         ORA LOBROW,Y
+         STA BPOINT
+         LDA HIBROW,Y
+         CLC
+         ADC BITMAP
+         STA BPOINT+1
+         STA ROWHIGH
+         JMP ]1
+EXIT     RTS
+         <<<
+Of course, the 64 bitmap is made up of 'rows of eight', i.e. after
+every eight rows we have to reset the bitmap pointer.  If we move
+past the top of the screen (i.e. Y<0) then there is no more polygon
+left to draw.  Otherwise the new bitmap location is computed, by
+adding the offset to BITMAP, the variable which lets the routine
+draw to any bitmap (BITMAP contains the location in memory of the
+bitmap).  Note also that the high byte is stored to ROWHIGH --
+ROWHIGH is what the routine immediately above uses to restore the
+high byte of BPOINT!
+	And that, friends, is the essence of the polygon routine,
+and concludes the disassembly of the 3d library routines.
+</code>
+===== Section 4: Cool World =====
+<code>
+	Now that we have a nice set of routines for doing 3D graphics,
+it's time to write a program which actually uses those routines to
+construct a 3D world.  That program is Cool World.  Not only does it
+demonstrate the 3d library, but it makes sure that all of the library
+routines actually work!
+	Although the algorithms for constructing a world were discussed
+way back in Section 2, it's probably a good idea to very quickly
+summarize those algorithms, just as a reminder.  After that, this
+section will go through the major portions of the Cool World code,
+and explain how (and why) things work.
+	In the 3d world, each object has certain properties.  It has
+a position in the world.  It also has an orientation; it points in
+some direction.  It might also have some velocity, and other stuff,
+but all we'll worry about in Cool World is position and orientation.
+	Each object is defined by a series of points which are defined
+relative to the object center (the object's position).  The orientation
+tells how to rotate those points about the center -- to get the object
+rotating in the right direction, we just apply a rotation matrix to
+the object.
+	In Cool World, we will be the only thing moving around in
+the world -- all the other objects are fixed in position, although
+some of them are rotating.  To figure out how objects look from our
+perspective, we first figure out if the object is visible by using
+the equation R'(Q-P).  P is our position, and is subtracted from
+the object position Q to get its relative position.  R' then rotates
+the world around us, to get it pointing in the direction of our
+orientation vector.  If the object is visible, then we compute the
+full object by using the equation
+	R' (M X + Q-P).
+X represents the points of an object.  M is the rotation matrix to
+get the object pointing in the right direction.  The rotated points
+are then added to the relative center (Q-P) and rotated according to
+our orientation.  The points are then projected.  The job of
+Cool World and the 3D library is to implement that tiny little
+equation up there.
+	Once all the visible objects are rotated and projected,
+they need to be depth-sorted to figure out what order to draw them
+in.  Each object is then drawn, in order, and around and around it
+goes.  With this in mind:
+*
+* Cool World (Polygonamy II: Monogonamy)
+*
+* Just yer basic realtime 3d virtual world program for
+* the C64.  This program is intended to demonstrate the
+* 3d library.
+*
+* SLJ 8/15/97
+*
+The code starts with a bunch of boring setup stuff, but soon gets
+to the
+* Main loop
+MAINLOOP
+which resets a few things, swaps the buffers, and clears the new draw
+buffer.  The screen clearing is done in a very brain-dead way; as
+you may recall, polygonamy only cleared the parts of the screen that
+needed clearing.  I decided this time around that this was more trouble
+than it was worth, especially with the starfields hanging around.
+Still, a smart screen clear might keep track of the highest and
+lowest points drawn, and only clear that part of the screen.
+	Once the preliminaries are over with, the main part gets
+going:
+:C1      JSR READKEY
+         JSR UPD8POS      ;Update position if necessary
+         JSR UPD8ROT      ;Update second rotation matrix
+         JSR PLOTSTAR
+         JSR SORT         ;Construct visible object lst
+First the keyboard is scanned.  If a movement key is pressed, then the
+stars are updated; if we turn, then our orientation vector is
+updated via calls to ACCROTX, ACCROTY, and ACCROTZ.
+	The orientation vector is very easy to keep track of.
+We enter the world pointing straight down the z-axis, so initially
+the orientation vector is V=(0,0,1).  Now it gets a little trickier.
+Let's say we turn to the left.  We can think of this two ways:
+our orientation vector rotates left, or the world rotates around
+us to the right.  The problem here is that these aren't the
+same thing!
+	"What?!" you may say.  The problem here is not the single
+rotation, but the rotations that come after it.  Imagine a
+little x-y-z coordinate axis coming out of your shirt, with
+the z-axis pointing straight ahead of you.  When you turn in
+some direction, that coordinate system needs to turn with you,
+because rotations are always about those axis.  In other words,
+the world needs to rotate around those axis; the *inverse* of
+those rotations gives the orientation vector.
+	What would happen if we instead rotated the orientation vector?
+Imagine a pencil coming out of your chest -- that's the orientation
+vector.  Now turn it to the left, with you still looking at the
+monitor.  Once it's out at 45 degrees or so, let's say we started
+spinning around the z-axis.  Well that z-axis is still facing
+towards the monitor, and the pen rotates about THAT axis, drawing
+a circle around the monitor as it goes around (or, if you like,
+making a cone-shape with the tip of the cone at your chest).
+If, on the other hand, YOU turn left, and then start spinning,
+you can see that something very different happens -- the monitor
+now circles around the pencil.
+	The point is that we need to rotate the world around us.
+Specifically, by using ACCROTX and friends we maintain a rotation
+matrix which will rotate the world around us.  But we still need
+an orientation vector, to tell us what direction to move in when
+we move forwards or backwards.  That rotation matrix tells us
+how to rotate the whole world so that it "points in our direction";
+therefore, if we un-do all of those rotations, we can figure out
+what direction we were originally pointing in.  In other words,
+the inverse rotation matrix, when applied to the original orientation
+vector (0,0,1), gives the current orientation vector.
+	Recall that inverting a rotation matrix is really easy:
+just take the transpose.  And what happens when apply this new
+rotation matrix to V=(0,0,1)?  With a simple calculation, you can
+see that M V just gives the third row of M, for any matrix M.
+So, let's say that we've calculated the accumulated rotation
+matrix R'.  To invert it just take the transpose, giving R.
+The orientation vector is thus RV: the third row of R.  But
+the third row of R is just the third *column* of R', since R'
+is the transpose of R.  Therefore, the orientation vector -- the
+direction we are currently pointing in -- is simply the third
+column of the accumulated rotation matrix.  We don't have to
+do any extra work at all to compute that orientation vector; it
+just falls right out of the calculation we've already done!
+	So, to summarize: Keys are read in.  Any rotations
+will cause an update of the rotation matrix.  Our orientation
+vector is simply the third column of that matrix.  (Note that
+the same logic holds for other objects, if they happen to move).
+If we move forwards or backwards, then the stars are updated and
+a flag is set for UPD8POS.
+JSR UPD8POS
+-----------
+	Truth in advertising: UPD8POS updates positions.  Specifically,
+our position, since we are the only ones moving.  The world equation
+is R' (MX + Q-P), which we can write as R'MX + R'(Q-P).  Q is the
+position of an object; changing our position P just changes Q-P.  In
+Cool World there's no need to keep track of our position in the world;
+we might as well just keep track of the relative positions of objects.
+Instead of updating a position P when we move, Cool World just
+changes the locations Q of all the objects.
+	Since objects never move, the quantity R'(Q-P) won't change as
+long as WE don't move.  UPD8POS calculates that quantity, and does so
+only if we move or rotate -- if an object were to move, this quantity
+would have to be recalculated for that object.  Note that the calculation
+is done using GLOBROT, the lib3d routine to rotate the 16-bit centers.
+Later on, those rotated 16-bit centers will be added to R'MX, the rotated
+object points, if an object is visible.
+	Cool World lets you travel at different speeds by pressing
+the keys 0-4.  Different speeds are easy to implement.  Recall that
+the rotation matrix is multiplied by 64.  Grabbing the third column
+of that matrix gives a vector of length 64 (rotations don't change
+lengths, so multiplying a rotation matrix times (0,0,1) must
+always return a vector of the same length).  By changing the length
+of that orientation vector -- for example, by multiplying or dividing
+by two -- we can change the amount which we subtract from the object
+positions (or add to our position, if you like), and hence the speed
+at which we move through the world.
+JSR UPD8ROT
+-----------
+	Since some of the objects need to rotate, a second set of
+rotation matrices is maintained.  As far as the lib3d routines are
+concerned there is only one matrix, in zero page.  Therefore the
+global rotation matrix needs to be stored in memory before the
+new matrices are calculated.
+	The new rotation matrices are calculated using CALCMAT,
+the routine which only needs the x, y, and z angles.  In fact,
+only one matrix is calculated using this routine.  The other
+matrices are just permutations of that one rotation matrix; for
+example, the transpose (the inverse, remember) is a new rotation
+matrix.  We can also get new matrices by, say, a cyclic permutation
+of the rows -- this is the same as permuting the coordinates
+(i.e. x->y y->z z->x), which is the same as rotating the whole
+coordinate system.  As long as the determinant is one, the new
+matrix will preserve areas.
+	Once the new matrix is calculated, it is stored in memory
+for later use by the local rotation routines (ROTPROJ).  To get
+a different matrix, all we have to do is copy the saved matrix
+to zero page in a different way.  That is, we don't have to store
+ALL of the new rotation matrices in memory; we can do any row swapping
+and such on the fly, when copying from storage into zero page.
+	Note that by using CALCMAT we can use large angle increments,
+to make the objects appear to spin faster.  ACCROTX and friends can
+only accumulate by a fixed angle amount.
+JSR PLOTSTAR
+------------
+	PLOTSTAR is another imaginatively named subroutine which,
+get this, plots stars.  The starfield routine is discussed elsewhere
+in this issue.
+JSR SORT
+--------
+	Finally, this routine -- you won't believe this -- sorts the
+objects according to depth.  The sort is just a simple insertion
+sort -- the object z-coordinates are traversed, and each object
+is inserted into an object display list.  The list is searched
+from the top down, and elements are bumped up the list until the
+correct spot for the new object is found.  The object list is thus
+always sorted.  This kind of sort is super-easy to implement in
+assembly, and "easy" is a magic word in my book.
+	SORT also does the checking for visibility.  As always, the
+easiest and stupidest approach is taken: if the object lies in
+a 45-degree wedge in front of us, then it is visible.  This wedge
+is easiest because the boundaries are so simple: the lines z=x, z=-x,
+z=y, and z=-y.  If the object center lies within those boundaries --
+if -x < z < x and -y < z < y -- then the object is considered to
+lie in our field of vision.  Since we don't want to view objects
+which are a million miles away, a distance restriction is put on
+the z-coordinate as well: if z>8192 then the object is too far away.
+We also don't want to look at objects which are too close -- in
+particular, we don't want some object points in front of us and
+some behind us -- so we need a minimum distance restriction as well.
+Since the largest objects have points 95 units away from the center,
+checking for z>100 is pretty reasonable.
+	By the way -- that "distance" check only checks the z-coordinate.
+The actual distance^2 is x^2 + y^2 + z^2.  This leads to the "peripheral
+vision" problem you may have noticed -- that an object may become
+visible towards the edge of the screen, but when you turn towards it
+it disappears.
+	So, once the centers have been rotated, the extra rotation
+matrix calculated, and the object list constructed, all that is
+left to do is to plot any visible objects:
+:LDRAW   LDY NUMVIS       ;Draw objects from object list
+         BEQ :DONE
+         LDA OBLIST-1,Y   ;Get object number
+         ASL
+         TAX
+         LDA VISTAB,X
+         STA :JSR+1
+         LDA VISTAB+1,X
+         STA :JSR+2
+:JSR     JSR DRAWOB1
+         DEC NUMVIS
+         BNE :LDRAW
+:DONE
+         JMP MAINLOOP
+VISTAB   DA DRAWOB1
+         DA DRAWOB2
+	 ...
+The object number from the object list is used as an index into
+VISTAB, the table of objects, and each routine is called in order.
+It turns out that in Cool World there are two kinds of objects:
+those that rotate, and those that don't.  What's the difference?
+Recall the half of the world equation that we haven't yet
+calculated: R'MX.  Objects that don't rotate have no rotation
+matrix M.  And for objects that do rotate, I just skipped applying
+the global  rotation R' (these objects are just spinning more or
+less randomly, so why bother).  This becomes very apparent when
+viewing the Cobra Mk III -- no matter what angle you look at it
+from, it looks the same!  A full implementation would of course
+have to apply both R' and M.  Anyways, with this in mind, let's
+take a closer look at two representative objects: the center
+tetrahedron, which doesn't rotate, and one of the stars, which
+does.
+*
+* Object 1: A simple tetrahedron.
+*
+* (1,1,1) (1,-1,-1) (-1,1,-1) (-1,-1,1)
+TETX     DFB 65,65,-65,-65
+TETY     DFB 65,-65,65,-65
+TETZ     DFB 65,-65,-65,65
+These are the points that define the object, relative to its center.
+As you can see, they have length 65*sqrt(3) = 112.  This is within
+the limitation discussed earlier, that all vectors must have length
+less than 128.  Other objects -- for example, the cubes -- have coordinates
+like (55,55,55), giving length 55*sqrt(3) = 95.
+* Point list
+FACE1    DFB 0,1,2,0
+FACE2    DFB 3,2,1,3
+FACE3    DFB 3,0,2,3
+FACE4    DFB 3,1,0,3
+A tetrahedron has four faces, and the above will tell POLYFILL how
+to connect the dots.  They go around the faces in counter-clockwise
+order.  Note that the first and last point are the same; this is
+necessary because of the way POLYFILL works.  When drawing from
+the point list, POLYFILL only considers points to the left or
+to the right.  When it gets to one end of the list, it jumps
+to the other end of the list, and so always has a point to the
+left or to the right of the current spot in the point list.
+(Bottom line: make sure the first point in the list is also the
+last point).
+TETCX    DA 0             ;Center at center of screen
+TETCY    DA 00
+TETCZ    DA INITOFF       ;and back a little ways.
+OB1POS   DFB 00           ;Position in the point list
+OB1CEN   DFB 00           ;Position of center
+TETCX etc. are where the object starts out in the world.  OB1POS is
+a leftover that isn't used.  OB1CEN is the important variable here.
+The whole 3d library is structured around lists.  All of the object
+centers are stored in a single list, and are first put there by an
+initialization routine.  This list of object centers is then used
+for everything -- for rotations, for visibility checks, etc.
+OB1CEN tells where this object's center is located in the object
+list.
+	This next part is the initialization routine, called when
+the program is first run:
+*
+* ]1, ]2, ]3 = object cx,cy,cz
+*
+INITOB   MAC
+         LDA ]1
+         STA C0X,X
+         LDA ]1+1
+         STA C0X+256,X
+         LDA ]2
+         STA C0Y,X
+         LDA ]2+1
+         STA C0Y+256,X
+         LDA ]3
+         STA C0Z,X
+         LDA ]3+1
+         STA C0Z+256,X
+         <<<
+INITOB1                   ;Install center into center list
+         LDX NUMOBJS
+         STX OB1CEN
+         >>> INITOB,TETCX;TETCY;TETCZ
+         INC NUMOBJS
+         RTS
+As you can see, all it does is copy the initial starting point,
+TETCX, TETCY, TETCZ, into the object list, and transfer
+NUMOBJS, the number of objects in the center list, into OB1CEN.
+It then increments the number of objects.  In this way objects
+can be inserted into the list in any order, and a more or less
+arbitrary number of objects may be inserted.  (Of course,
+REMOVING an object from the list can be a little trickier).
+	Now we move on to the main routine:
+*
+* Rotate and draw the first object.
+*
+* PLISTXLO etc. already point to correct offset.
+* ]1,]2,]3 = object X,Y,Z point lists
+SETPOINT MAC
+         LDA #<]1         ;Set point pointers
+         STA P0X
+         LDA #>]1
+         STA P0X+1
+         LDA #<]2
+         STA P0Y
+         LDA #>]2
+         STA P0Y+1
+         LDA #<]3
+         STA P0Z
+         LDA #>]3
+         STA P0Z+1
+         <<<
+SETPOINT simply sets up the point list pointers P0X P0Y and P0Z
+for the lib3d routines.  It is a macro because all objects need
+to tell the routines where their actual x, y, and z point coordinates
+are located, so this little chunk of code is used quite a lot.
+The routine called by the main loop now begins:
+DRAWOB1
+         LDX OB1CEN       ;Center index
+ROTTET                    ;Alternative entry point
+         >>> SETPOINT,TETX;TETY;TETZ
+         LDY #4           ;Four points to rotate
+         SEC              ;rotate and project
+         JSR ROTPROJ
+The second entry point, ROTTET, is used by other objects which
+are tetrahedrons but not object #1 (in Cool World there is a line
+of tetrahedrons, behind our starting position).  This of course
+means there won't be a lot of duplicated code.
+	Next the points need to be rotated and projected.  SETPOINT
+sets up the list pointers for ROTPROJ, .Y is set to the number
+of object points in those lists, and C is set to tell ROTPROJ to
+rotate and project.  If we were just rotating (C=clear, i.e. to
+calculate MX), we would have to set up some extra pointers to tell
+ROTPROJ where to store the rotated points; those rotated points can
+then be rotated and projected, as above.  Note that the rotation
+matrix is assumed to already be set up in zero page.  Once the points
+have been rotated and projected, all that remains is to...
+* Draw the object
+SETFILL  MAC
+                          ;]1 = number of points
+                          ;]2 = Face point list
+                          ;]3 = pattern
+         LDY #]1
+L1       LDA ]2,Y
+         STA PQ,Y
+         DEY
+         BPL L1
+         LDA #<]3
+         STA FILLPAT
+         LDA #>]3
+         STA FILLPAT+1
+         LDX #]1
+         <<<
+Again we have another chunk of code which is used over and over and
+over.  SETFILL just sets stuff up for POLYFILL, the polygon drawing
+routine.  It first copies points into the point queue; specifically,
+it copies from lists like FACE1, above, into PQ, the point queue
+used by POLYFILL.  It then sets FILLPAT, the fill-pattern pointer, to
+the 8 bytes of data describing the fill pattern.  Finally it loads
+.X with the number of points in the point queue, for use by
+POLYFILL.  All that remains is to call POLYFILL -- all the other
+pointers and such are set up, and POLYFILL does the hidden face
+check for us.
+DRAWTET                   ;Yet another entry point
+         >>> SETFILL,3;FACE1;DITHER1
+         JSR POLYFILL
+         >>> SETFILL,3;FACE2;ZIGZAG
+         JSR POLYFILL
+         >>> SETFILL,3;FACE3;CROSSSM
+         JSR POLYFILL
+         >>> SETFILL,3;FACE4;SOLID
+         JMP POLYFILL
+And that's the whole routine!  DITHER1, ZIGZAG etc. are just little
+tables of data that I deleted out, containing fill patterns.  For
+example, SOLID looks like
+SOLID	 HEX FFFFFFFFFFFFFFFF
+i.e. eight bytes of solid color.
+	The next object, a star, is more involved.  Not only is
+it rotating, but it is also a concave object, and requires clipping.
+That is, parts of this object can cover up other parts of the
+object.  We will therefore have to draw the object in just the
+right way, so that parts which are behind other parts will be
+drawn first.  Compare with the way in which we depth-sorted the
+entire object list, to make sure far-off objects are drawn before
+nearby objects.  When applied to polygons, it is called polygon
+clipping.
+	Incidentally, I *didn't* do any clipping on the Cobra Mk III.
+Sometimes you will see what appear to be glitches as the ship rotates.
+This is actually the lack of clipping -- polygons which should be
+behind other polygons are being drawn in front of those polygons.
+	There are two types of stars in Cool World, but each is
+constructed in a similar way.  The larger stars are done by
+starting with a cube, and then adding an extra point above the
+center of each face.  That is, imagine that you grab the center
+of each face and pull outwards -- those little extrusions are
+form the tines of the star.  The smaller stars are similarly
+constructed, starting from a tetrahedron.  The large stars therefore
+have six tines, and the smaller stars have four.
+	Clipping these objects is really pretty easy, because
+each of those tines is a convex object.  All we have to do is
+depth-sort the *tips* of each of the tines, and then draw the
+tines in the appropiate order.
+	Recall that in the discussion of objects, in Section 2,
+it was pointed out that any concave object may be split up into
+a group of convex objects.  That is basically what we are doing
+here.
+*
+* All-Stars: A bunch of the cool star things.
+*
+STOFF    EQU 530          ;Offset unit
+STCX     EQU -8000        ;Center
+STCY     EQU 1200
+STCZ     EQU -400+INITOFF
+STAR1CX  DA STCX
+STAR1CY  DA STCY
+STAR1CZ  DA STCZ
+OB7CEN   DFB 00
+STAR1X   DFB 25,-25,25,-25,50,50,-50,-50
+STAR1Y   DFB -25,-25,25,25,-50,50,50,-50
+STAR1Z   DFB 25,-25,-25,25,-50,50,-50,50
+S1T1F1   DFB 4,2,0,4      ;Star 1, Tine 1, face 1
+S1T1F2   DFB 4,1,2,4
+S1T1F3   DFB 4,0,1,4
+S1T2F1   DFB 5,0,2,5
+S1T2F2   DFB 5,3,0,5
+S1T2F3   DFB 5,2,3,5
+S1T3F1   DFB 6,2,1,6
+S1T3F2   DFB 6,3,2,6
+S1T3F3   DFB 6,1,3,6
+S1T4F1   DFB 7,1,0,7
+S1T4F2   DFB 7,0,3,7
+S1T4F3   DFB 7,3,1,7
+INITOB7
+         LDX NUMOBJS
+         STX OB7CEN
+         >>> INITOB,STAR1CX;STAR1CY;STAR1CZ
+         INC NUMOBJS
+         RTS
+As before, the above stuff defines the position, points, and
+faces of the object, and INITOB7 inserts it into the center list.
+The main routine then proceeds similarly:
+DRAWOB7
+         LDX OB7CEN       ;Center index
+ROTSTAR1 JSR RFETCH       ;Use secondary rotation matrix
+SETSTAR1 >>> SETPOINT,STAR1X;STAR1Y;STAR1Z
+         LDY #8           ;Eight points to rotate
+         SEC              ;rotate and project
+         JSR ROTPROJ
+Note the JSR RFETCH above.  RFETCH retrieves one of the rotation
+matrices from memory, and copies it into zero page for use by ROTPROJ.
+Again, different rotation matrices may be constructed by performing
+this copy in slightly different ways.  The tines must then be
+sorted, (and the accumulation matrix is restored to zero page),
+and once sorted the tines are drawn in order.  The code below
+just goes through the sorted tine list, and calls :TINE1 through
+:TINE4 based on the value in the list.  The sorting routine will
+be discussed shortly.
+         JSR ST1SORT      ;Sort the tines
+         JSR IFETCH       ;Restore accumulation matrix
+* Draw the object.  In order to handle overlaps correctly,
+* the tines must first be depth-sorted, above.
+DRAWST1
+         LDX #03
+ST1LOOP  STX TEMP
+         LDY T1LIST,X     ;Sorted tine list
+         BEQ :TINE1
+         DEY
+         BNE :C1
+         JMP :TINE2
+:C1      DEY
+         BNE :TINE4
+         JMP :TINE3
+:TINE4
+         >>> SETFILL,3;S1T4F1;SOLID
+         JSR POLYFILL
+         >>> SETFILL,3;S1T4F2;DITHER1
+         JSR POLYFILL
+         >>> SETFILL,3;S1T4F3;DITHER2
+         JSR POLYFILL
+:NEXT    LDX TEMP
+         DEX
+         BPL ST1LOOP
+         RTS
+:TINE1	...draw tine 1 in a similar way
+:TINE2   >>> SETFILL,3;S1T2F1;ZIGS
+	...
+:TINE3   >>> SETFILL,3;S1T3F1;BRICK
+	...
+T1LIST   DS 6             ;Tine sort list
+T1Z      DS 6
+Now we get to the sorting routine.  It depth-sorts the z-coordinates,
+so it actually needs the z-coordinates.  Too bad we rotated and
+projected, and no longer have the rotated z-coordinates!  So we
+have to actually calculate those guys.
+	Now, if you remember how rotations work, you know that
+the z-coordinate is given by the third row of the rotation matrix
+times the point being rotated -- in this case, those points are the
+tips of the individual tines.  Well, we know those points are
+at (1,-1,-1), (1,1,1), (-1,1,-1), and (-1,-1,1).  Actually, they
+are in the same *direction* as those points/vectors, but have
+a different length (i.e. STAR1X etc. defines a point like 50,-50,-50).
+The lengths don't matter though -- whether a star is big or
+small, the tines are still ordered in the same way.  And order is
+all we care about here -- which tines are behind which.
+	So, calculating the third row times points like (1,-1,-1)
+is really easy.  If that row has elements (M6 M7 M8), then the
+row times the vector just gives M6 - M7 - M8.  So all it takes is
+an LDA and two SBC's to calculate the effective z-coordinate.
+Just for convenience I added 128 to the result, to make everything
+positive.
+	After calculating these z-coordinates, they are inserted into
+a little list.  That list is then sorted.  Yeah, an insertion sort
+would have been best here, I think.  Instead, I used an ugly,
+unrolled bubble-like sort.
+	By the way, the cubic-stars are even easier to sort, since
+their tines have coordinates like (1,0,0), (0,1,0) etc.  Calculating
+those z-coordinates requires an LDA, and nothing else!
+*
+* Sort the tines.  All that matters is the z-coord,
+* thus we simply dot the third row of the matrix
+* with the tine endpoint, add 128 for convenience,
+* and figure out where stuff goes in the list.
+*
+ST1SORT                   ;Sort the tines
+         LDX #00
+         LDA MATRIX+6     ;Tine 1: 1,-1,-1
+         SEC
+         SBC MATRIX+7
+         SEC
+         SBC MATRIX+8
+         EOR #$80         ;Add 128
+         STA T1Z          ;z-coord
+         STX T1LIST
+         LDA MATRIX+6     ;Tine 2: 1,1,1
+         CLC
+         ADC MATRIX+7
+         CLC
+         ADC MATRIX+8
+         EOR #$80
+         STA T1Z+1
+         INX
+         STX T1LIST+1
+         LDA MATRIX+7     ;Tine 3: -1,1,-1
+         SEC
+         SBC MATRIX+6
+         SEC
+         SBC MATRIX+8
+         EOR #$80
+         STA T1Z+2
+         INX
+         STX T1LIST+2
+         LDA MATRIX+8     ;Tine 4: -1,-1,1
+         SEC
+         SBC MATRIX+7
+         SEC
+         SBC MATRIX+6
+         EOR #$80
+         STA T1Z+3
+         INX
+         STX T1LIST+3
+         CMP T1Z+2        ;Now bubble-sort the list
+         BCS :C1          ;largest values on top
+         DEX              ;So find the largest value!
+	...urolled code removed for sensitive viewers.
+And that's all it takes!
+	And that, friends, covers the main details of Cool World.
+	Can it possibly be true that this article is finally coming
+to an end?
+</code>
+===== Section 5: 3d library routines and memory map =====
+<code>
+There are seven routines:
+$8A00	CALCMAT		- Calculate a rotation matrix
+$8A03	ACCROTX		- Add a rotation to rotation matrix around x-axis
+$8A06	ACCROTY		- ... y-axis
+$8A09	ACCROTZ		- ... z-axis
+$8A0C	GLOBROT		- 16-bit rotation for centers
+$8A0F	ROTPROJ		- Rotate/Rotate and project objects
+$8A12	POLYFILL	- Draw a pattern-filled polygon
+Following a discussion of the routines there is a memory map and discussion
+of the various locations.
+Library Routines
+----------------
+$8A00 CALCMAT	Calculate a rotation matrix
+	Stuff to set up: Nothing.
+	On entry: .X .Y .A = theta_x theta_y theta_z
+	On exit: Rotation matrix is contained in $B1-$B9
+	Cycle count: 390 cycles, worst case.
+$8A03 ACCROTX	This will "accumulate" a rotation matrix by one rotation
+		around the x-axis by the angle delta=2*pi/128.  Because
+		this is such a small angle the fractional parts must also
+		be remembered, in $4A-$52.  These routines are necessary
+		to do full 3d rotations.
+	On entry: carry clear/set to indicate positive/negative rotations.
+	On exit: Updated matrix in $B1-$B9, $4A-$52
+	Cycle count: Somewhere around 150, I think.
+$8A06 ACCROTY	Similar around y-axis
+$8A09 ACCROTZ	Similar around z-axis
+$8A0C GLOBROT	Perform a global rotation of 16-bit signed coordinates
+		(Rotate centers)
+	Stuff to set up:
+	  MULTLO1 MULTLO2 MULTHI1 MULTHI2 = $F7-$FE
+	    Multiplication tables
+	  C0XLO, C0XHI, C0YLO, C0YHI, C0ZLO, C0ZHI = $63-$6E
+	    Pointers to points to be rotated.  Note also that $63-$6E
+	    has a habit of getting hosed by the other routines.
+	  CXLO, CXHI, CYLO, CYHI, CZLO, CZHI = $A3-$AE
+	    Pointers to where rotated points will be stored.
+	On entry: .Y = number of points to rotate (0..Y-1).
+	On exit: Rotated points are stored in CXLO CXHI etc.
+$8A0F ROTPROJ	Rotate and project object points (vertices).  Points must
+		be in range -96..95, and must be with 128 units of the
+		object center.  Upon rotation, they are added to
+		the object center, projected if C is set, and stored
+		in the point list.
+		The _length_ of a vertex vector (sqrt(x^2 + y^2 + z^2)) must
+		be less than 128.
+	Stuff to set up:
+	  Rotation matrix = $B1-$B9 (Call CALCMAT)
+	  ROTMATH = $B0
+	  P0X P0Y P0Z = $69-$6E
+	    Pointers to points to be rotated.  As with GLOBROT, these
+	    pointers will get clobbered by other routines.  Points are
+-bit signed numbers in range -96..95, and must have
+	    length less than 128.
+	  PLISTXLO PLISTXHI ... = $BD-$C4
+	    Where to store the rotated+projected points.
+	  CXLO CXHI ... = $A3-$AE
+	    List of object centers.  Center will be added to rotated point
+	    before projection.
+	  XOFFSET, YOFFSET = $53, $54
+	    Location of origin on the screen.  Added to projected points
+	    before storing in PLIST.  160,100 gives center of screen.
+	On entry: .X = Object center index (center is at CXLO,X CXHI,X etc.)
+		  .Y = Number of points to rotate (0..Y-1).
+		  .C = set for rotate and project, clear for just rotate
+	On exit: Rotated, possibly projected points in PLIST.
+$8A12 POLYFILL	Draw pattern-filled polygon into bitmap.
+	Stuff to set up:
+	  MULTLO1, MULTLO2, MULTHI1, MULTHI2 = $F7-$FE
+	  BITMAP = $FF
+	  FILLPAT = $BB-$BC
+	  PLISTXLO, PLISTXHI, PLISTYLO, PLISTYHI = $BD-$C4
+	  Point queue = $0200.  Entries are _indices_ into the PLISTs,
+	    must be ordered _counter clockwise_, and must _close_ on
+	    itself (example: 4 1 2 4).
+	On entry: .X = Number of points in point queue (LDX #3 in above
+	  example)
+	On exit: Gorgeous looking polygon in bitmap at BITMAP.
+Memory map
+----------
+Zero page:
+$4A-$52 ACCREM		Fractional part of accumulating matrix
+$53	XOFFSET		Location of origin on screen (e.g. 160,100)
+$54	YOFFSET
+$55-$72	Intermediate variables.  These locations are regularly hosed
+	by the routines.  Feel free to use them, but keep in mind that
+	you will lose them!
+$A3-$AE	C0XLO, C0XHI, C0YLO, ...
+	Pointers to rotated object centers, i.e. where the objects are
+	relative to you.  Centers are 16-bit signed (2's complement).
+$AF-$B0	ROTMATH
+	Pointer to the multiplication table at $C200-$C3FF.
+$B1-$B9	Rotation matrix.
+$BB-$BC	FILLPAT
+	Pointer to fill pattern (eight bytes, 8x8 character)
+$BD-$C0	PLISTXLO, PLISTXHI
+$C1-$C4 PLISTYLO, PLISTYHI
+	Pointers to rotated object points, e.g. used by polygon renderer.
+	Numbers are 16-bit 2's complement.
+$F7-$FE	MULTLO1, MULTLO2, MULTHI1, MULTHI2
+	Pointers to math tables at $C400-$C9FF
+	MULTLO1 -> $C500
+	MULTLO2 -> $C400
+	MULTHI1 -> $C800
+	MULTHI2	-> $C700
+	(Only the high bytes of the pointers are actually relevant).
+$FF	BITMAP
+	High byte of bitmap base address.
+Library:
+$0200-$027F	Point queue.
+$8600-$9FFF	Routines, tables, etc.
+$8A00		Jump table into routines
+Tables:
+$8400-$85FF	ROTMATH, pointed to by $AF-$B0.
+$C000-$C2FF	MULTLO, pointed to by $F7-$F8 and $F9-$FA
+$C300-$C5FF	MULTHI, pointed to by $FB-$FC and $FD-$FE
+$C600-$C6FF	CDEL, table of f(x)=x*cos(delta)
+$C700-$C7FF	CDELREM  remainder
+$C800-$C8FF	SDEL	x*sin(delta)
+$C900-$C9FF	SDELREM
+$CA00-$CAFF	PROJTAB, projection table, f(x)=d/x, integer part
+$CB00-$CBFF	PROJREM, projection table, 256*remainder
+$CC00-$CCFF	LOG, logarithm table
+$CD00-$CDFF	EXP, exponential table
+$CE00-$CEFF	NEGEXP, exponential table
+$CF00-$CFFF	TRIG, table of 32*sin(2*pi*x/128) -- 160 bytes.
+ROTMATH is the Special 8-bit signed multiplication table for ROTPROJ.
+MULTLO and MULTHI are the usual fast-multiply tables.  CDEL and SDEL
+are used by ACCROTX, to accumulate matrices.  PROJTAB is used
+by ROTPROJ, to project the 16-bit coordinates.  LOG EXP and NEGEXP
+are used by the POLYFILL divide routine, to calculate line slopes.
+Finally, TRIG is used by CALCMAT to calculate rotation matrices.
+The tables from $C600-$CFFF are fixed and must not be moved.  The
+tables ROTMATH, MULTLO, and MULTHI on the other hand are only
+accessed indirectly, and thus may be relocated.  Thus there is
+room for a color map and eight sprite definitions in any VIC bank.
+</code>
+===== Section 6: Conclusions =====
+<code>
+	Wow.  You really made it this far.  I am totally impressed.
+What stamina.
+	Well I hope you have been able to gain something from this
+article.  It is my sincere hope that this will stimulate the
+development of some nice 3d programs.  After a long, multi-year
+break from 3d graphics I really enjoyed figuring out all this
+stuff again, and finally doing the job right.  I also enjoyed
+re-re-figuring it out, to write this article, since over half a
+year has passed since I wrote the 3d library!  I hope you have
+enjoyed it too, and can use this knowledge (or improve on it!)
+to do some really nifty stuff.
+	SLJ 4/3/98
+</code>
+===== Section 7: Binaries =====
+<code>
+	There are three files below.  The first is the 3d library.
+The second is the tables from $C000-$CFFF.  The third is the ROTMATH
+table ($8400-$85FF above).
+::::::::: lib3d.o :::::::::
+begin 644 lib3d.o
+M`(8```````````$!`0$!`0$!`@("`@("`@(#`P,#`P,#`P0$!`0$!`0$!04%
+M!04%!04&!@8&!@8&!@<'!P<'!P<'"`@("`@("`@)"0D)"0D)"0H*"@H*"@H*
+M"PL+"PL+"PL,#`P,#`P,#`T-#0T-#0T-#@X.#@X.#@X/#P\/#P\/#Q`0$!`0
+M$!`0$1$1$1$1$1$2$A(2$A(2$A,3$Q,3$Q,3%!04%!04%!05%145%145%186
+M%A86%A86%Q<7%Q<7%Q<8&!@8&!@8&!D9&1D9&1D9&AH:&AH:&AH;&QL;&QL;
+M&QP<'!P<'!P<'1T='1T='1T>'AX>'AX>'A\?'Q\?'Q\?`(#`X/#X_/X`@,#@
+M\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`
+M@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X
+M_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`
+MX/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^
+M`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P
+M^/S^`(#`X/#X_/X`@,#@\/C\_G\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_
+M'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!
+M`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/
+M!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_
+M/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#
+M`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?
+M#P<#`0"@`)%MH`B1;:`0D6V@&)%MH""1;:`HD6V@,)%MH#B1;:!`D6V@2)%M
+MH%"1;:!8D6V@8)%MH&B1;:!PD6V@>)%MH("1;:"(D6V@D)%MH)B1;:"@D6V@
+MJ)%MH+"1;:"XD6V@P)%MH,B1;:#0D6V@V)%MH."1;:#HD6V@\)%MH/B1;>9N
+MH`"1;:`(D6V@$)%MH!B1;:`@D6V@*)%MH#"1;:`XD6U@3`")`$"`P`!`@,``
+M0(#``$"`P`!`@,``0(#````!`@,%!@<("@L,#0\0$1(4%187&1H;'!X`"!`8
+M("@P.$!(4%A@:'!X@(B0F*"HL+C`R-#8X.CP^``($!@@*#`X3#*:3"J;3%:;
+M3(*;3`V<3*N=3&:*``0(#!`4&!P@)"@L,#0X/$!$2$Q05%A<8&1H;'!T>'R`
+MAHJ.DI::GJ("!@H.$A8:'B(F*BXR-CH^0D9*3E)66EYB9FIN<G9Z?H2(C)"4
+MF)R@AEB&6:D`A56%5H5:O``"L<'%5;'#,`[E5I`*AEJQP855L<.%5LK0Y*9:
+MT`%@AEN\``*QO85DA6>QOX5EA6BE5496:D96:D96:H5KJJ55*0>%:AVFB85M
+MO;^)&&7_A6Z%;Z9:QED0"^9>I5G)_["]8*98O``"A%6QP<HP]+P``H16./'!
+M\-V%7(5>A7*&6B#WF##<I%6QO85DI%8X\;V%<*15L;^%9:16\;\0`TSKBX5Q
+M(*Z9(`"9(("+I%:QO:15./&]A7"D5K&_I%7QOQ`#3(*,A7&QO85GL;^%:""N
+MF2!9F856I60XY6>%<*5EY6B%<:5P..5@L`/&<3CE8H5PI7'I`#`%!7#P`6"D
+M:\`9L`6E5DQ?CJ@83)V5\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`
+ML.FI(.9N?0"&A6RHO0"'A5>Y%HJ-I(FF6\99$`FE6<G^L"=@H@#D6+#ZO``"
+MA%6QP>@XO``"A%;QP?#=A5V%7X5RAEL@]Y@PVF!H:&"D5K&]I%4X\;V%<*16
+ML;^D5?&_$`-,^8J%<;&]A62QOX5E(*Z9("F9(("+I%:QO:15./&]A7"D5K&_
+MI%7QOQ`#3.J,A7&QO85GL;^%:""NF2!9F856I60XY6>%<*5EY6B%<:5P&&5@
+MD`+F<3CE8H5PI7'I`#`0!7#0"Z5A..5CI6#E8I`!8*1KP!FP!:563$60I588
+M3+*6I%6QO85GI%8X\;V%<*15L;^%:*16\;\0`TSYBH5Q(*Z9((&9A5:E9#CE
+M9X5PI67E:(5QI7`XY6"P`L9Q&&5BA7"E<6D`,!`%<-`+I6,XY6&E8N5@D`%@
+MI&O`&;`%I59,^Y.E5AA,SYBD5;&]A6>D5CCQO85PI%6QOX5HI%;QOQ`#3.J,
+MA7$@KID@@9F%5J5D..5GA7"E9>5HA7&E<!AE8)`#YG$896*E<6D`,`%@I&O`
+M&;`%I59,&Y*E5AA,PI>F6L99$`OF7J59R?^P2V"F6+P``H15L<'*,/2\``*$
+M5CCQP?#=A5R%7H5RAEH@]Y@PW*15L;V%9*16./&]A7"D5;&_A66D5O&_$`-,
+M2H^%<2"NF2``F4RTC6#&7O"BI68XY6&%9J5DY6"%9*JP`L9EI648\"+)`I`6
+M"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-
+MI(G&7]!=IEO&61`)I5G)_K"A8*(`Y%BP^KP``H15L<'H.+P``H16\<'PW85=
+MA5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_$`-,OY.%<;&]A6>QOX5H
+M(*Z9(%F93%^.I6D896.%::5G96*JA6>0`^9H&*5H\$W)`I!#"K`:IFPP%J1J
+ML;N%5;S8B5%M)5=%59%MI54@HXE,[8ZD;#!DH@"]`(A%5X57I&JQNX55IFR\
+MV(E1;2571561;4SMCN!`L+RI('T`AL5LD-#PU+P`B(15IFR%;*1JL;N%5KS8
+MB5%M)5=%5I%MI&R^/HJI8)T`B:56(*.)46TE5456D6VID9T`B:5OA6[&;<9J
+M,`-,FXVI!X5JI&OP%8B$:QFFB85MN;^)&&7_A6Z%;TR;C6"F6L99$`OF7J59
+MR?^P4V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16L;VD
+M53CQO85PI%:QOZ15\;\0`TQSC85QL;V%9+&_A64@KID@*9E,CH_&7O"?I688
+M96&%9J5D96"JA620`^9E&*5E\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0
+M&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-I(G&7]!IIEO&61`)I5G)_K!18*(`
+MY%BP^KP``H15L<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%
+M<*16L;^D5?&_$`-,WY&%<;&]A6>QOX5H(*Z9(%F93$60O``"L;VJL;\83$60
+MI6D896.%::5G96*JA6>0`^9H&*5H\$W)`I!#"K`:IFPP%J1JL;N%5;S8B5%M
+M)5=%59%MI54@HXE,TY"D;#!DH@"]`(A%5X57I&JQNX55IFR\V(E1;2571561
+M;4S3D.!`L+RI('T`AL5LD-#PU+P`B(15IFR%;*1JL;N%5KS8B5%M)5=%5I%M
+MI&R^/HJI8)T`B:56(*.)46TE5456D6VID9T`B:5OA6[&;<9J,`-,=8^I!X5J
+MI&OP%8B$:QFFB85MN;^)&&7_A6Z%;TQUCV"F6L99$`OF7J59R?^P4V"F6+P`
+M`H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16L;VD53CQO85PI%:Q
+MOZ15\;\0`TP4DX5QL;V%9+&_A64@KID@*9E,=)'&7O"?I68896&%9J5D96"J
+MA620`^9E&*5E\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N
+M?0"&A6RHO0"'A5>Y%HJ-I(G&7]!9IEO&61`)I5G)_K!-8*(`Y%BP^KP``H15
+ML<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5;&]A6>D5CCQO85PI%6QOX5H
+MI%;QOQ`#3/F/A7$@KID@@9E,&Y*E:3CE8X5II6?E8H5GJK`"QFBE:!CP3<D"
+MD$,*L!JF;#`6I&JQNX55O-B)46TE5T55D6VE52"CB4RIDJ1L,&2B`+T`B$57
+MA5>D:K&[A56F;+S8B5%M)5=%59%M3*F2X$"PO*D@?0"&Q6R0T/#4O`"(A%6F
+M;(5LI&JQNX56O-B)46TE5T56D6VD;+X^BJE@G0")I58@HXE1;25515:1;:F1
+MG0")I6^%;L9MQFHP`TQ;D:D'A6JD:_`5B(1K&::)A6VYOXD89?^%;H5O3%N1
+M8.9?O``"L;VJL;\83%23IEK&61`+YEZE6<G_L.1@IEB\``*$5;'!RC#TO``"
+MA%8X\<'PW85<A5Z%<H9:(/>8,-RD5;&]A62D5CCQO85PI%6QOX5EI%;QOQ`#
+M3#"1A7$@KID@`)E,5)/&7O"CI68XY6&%9J5DY6"%9*JP`L9EI648\"+)`I`6
+M"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-
+MI(G&7]!9IEO&61`)I5G)_K!-8*(`Y%BP^KP``H15L<'H.+P``H16\<'PW85=
+MA5^%<H9;(/>8,-JD5;&]A6>D5CCQO85PI%6QOX5HI%;QOQ`#3!^.A7$@KID@
+M@9E,^Y.E:3CE8X5II6?E8H5GJK`"QFBE:!CP3<D"D$,*L!JF;#`6I&JQNX55
+MO-B)46TE5T55D6VE52"CB4R)E*1L,&2B`+T`B$57A5>D:K&[A56F;+S8B5%M
+M)5=%59%M3(F4X$"PO*D@?0"&Q6R0T/#4O`"(A%6F;(5LI&JQNX56O-B)46TE
+M5T56D6VD;+X^BJE@G0")I58@HXE1;25515:1;:F1G0")I6^%;L9MQFHP`TP[
+MDZD'A6JD:_`5B(1K&::)A6VYOXD89?^%;H5O3#N38,9>T%JF6L99$`OF7J59
+MR?^P2V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*15L;V%
+M9*16./&]A7"D5;&_A66D5O&_$`-,^Y6%<2"NF2``F4PGE6"E9CCE885FI63E
+M8(5DJK`"QF6E91C&7]!=IEO&61`)I5G)_K#:8*(`Y%BP^KP``H15L<'H.+P`
+M`H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_$`-,DYB%
+M<;&]A6>QOX5H(*Z9(%F93)V5I6D896.%::5G96*JA6>0`^9H&*5HQFHP`TRT
+ME*D'A6K&:Z1K&::)A6VYOXD89?^%;H5OP!CP`TRTE$R;C<9>T%VF6L99$`OF
+M7J59R?^P3V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16
+ML;VD53CQO85PI%:QOZ15\;\0`TSJE(5QL;V%9+&_A64@KID@*9E,.Y:E9AAE
+M885FI61E8*J%9)`#YF48I67&7]!>IEO&61`)I5G)_K!18*(`Y%BP^KP``H15
+ML<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_
+M$`-,AI>%<;&]A6>QOX5H(*Z9(%F93+*68*5I&&5CA6FE9V5BJH5GD`/F:!BE
+M:,9J,`-,Q96I!X5JQFND:QFFB85MN;^)&&7_A6Z%;\`8\`-,Q95,=8_&7M!=
+MIEK&61`+YEZE6<G_L+!@IEB\``*$5;'!RC#TO``"A%8X\<'PW85<A5Z%<H9:
+M(/>8,-RD5K&]I%4X\;V%<*16L;^D5?&_$`-,(9B%<;&]A62QOX5E(*Z9("F9
+M3%"7I68896&%9J5D96"JA620`^9E&*5EQE_06:9;QED0":59R?ZP36"B`.18
+ML/J\``*$5;'!Z#B\``*$5O'!\-V%785?A7*&6R#WF##:I%6QO85GI%8X\;V%
+M<*15L;^%:*16\;\0`TQQEH5Q(*Z9((&93,*7I6DXY6.%::5GY6*%9ZJP`L9H
+MI6@8QFHP`TS:EJD'A6K&:Z1K&::)A6VYOXD89?^%;H5OP!CP`TS:EDQ;D6#&
+M7M!9IEK&61`+YEZE6<G_L.U@IEB\``*$5;'!RC#TO``"A%8X\<'PW85<A5Z%
+M<H9:(/>8,-RD5;&]A62D5CCQO85PI%6QOX5EI%;QOQ`#3!"7A7$@KID@`)E,
+M79BE9CCE885FI63E8(5DJK`"QF6E91C&7]!9IEO&61`)I5G)_K!-8*(`Y%BP
+M^KP``H15L<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5;&]A6>D5CCQO85P
+MI%6QOX5HI%;QOQ`#3%V5A7$@KID@@9E,SYBE:3CE8X5II6?E8H5GJK`"QFBE
+M:!C&:C`#3.N7J0>%:L9KI&L9IHF%;;F_B1AE_X5NA6_`&/`#3.N73#N3I%6Q
+MPZ16\<-@AF#)_]`)AE:I`&J%89`&A6&*2H56J8"%9J5D..56A62JI67I`(5E
+M&&"&8,G_T`^I`&J%8:F`A6:F9*5ED!J%8:F`A6:*2H56I62J..56A62E9;`#
+MQF6H&&"&8LG_T`RI`&J%8ZF`A6F*D`F%8ZF`A6F*2AAE9X5GJJ5H:0"%:!A@
+MAF+)_]`/J0!JA6.I@(5IIF>E:)`7A6.I@(5IBJ9G2AAE9X5GI6B0!.9HJ!A@
+MI7%*I7!JJJ5R2O!GJ+T`S#CY`,R05*J]`,VJA?>%^TG_:0"%^87]I'*Q]SCQ
+M^855L?OQ_856I7#E5855I7'E5J55D![%<I`'Z.5RQ7*P^895JO`+O0#,./D`
+MS*J]`,ZF56#*97*0^TS]F:(`I7"D<DS]F:5Q2J5P:JJI_V!+2$%!04XAA62&
+M8H1C&&5B*7^JI6(XY60I?ZB](,\8>2#/A;6Y`,\X_0#/A;2*&&5C*7^JF#CE
+M8RE_J+T`SQAY`,_)@&J%L;D@SSC](,_)@&J%LJ5B&&5C*7^JI6(XY6,I?ZBY
+M`,\X_0#/A;.](,\8>2#/A;F*..5D*7^JF!AE9"E_J+T`SQAY`,_)@&J%81AE
+ML86XI6$XY;&%L;T@SSCY(,_)@&J%81AELH6WI;(XY6&%LJ5C&&5D*7^JI6,X
+MY60I?ZB](,\8>2#/..6QA;&Y(,\X_2#/..6XA;B]`,\X^0#/&&6RA;*]`,\8
+M>0#/&&6WA;>E8BE_JKT`SPJ%MF!F:*("AE6U385@M5"%8;2WM;2J(*Z;IE6E
+M8I5-I6.5M*5DE5"E996WRA#98&9HH@*&5;50A6"U2H5AM+&UMZH@KINF5:5B
+ME5"E8Y6WI6252J5EE;'*$-E@9FBB`H95M4J%8+5-A6&TM+6QJB"NFZ95I6*5
+M2J5CE;&E9)5-I665M,H0V6"E:!`0AF*$9*9@I&&&881@IF2D8KT`QQAY`,F%
+M8KT`QGD`R(5CI6(896"%8I`"YF.Y`,<X_0#)A62Y`,;]`,B%9:5D&&5AA620
+M`N9EIF@0#J9BA6*&9*5CIF6%989C8(B$5;%CA5>Q9858L6>%6K%IA5NQ:X5=
+ML6V%7J6QIK*DLR!WG*15D:6*D:.EM*:UI+8@=YRD59&IBI&GI;>FN*2Y('><
+MI%61K8J1JYCP`TP-G&!#4D%#2TE.1R!43T%35"P@1U)/34U)5"&%689<A%^H
+M\%:D6(7WA?M)_QAI`87YA?VQ]SCQ^85AL?OQ_85BI%>Q]SCQ^85@L?OQ_1AE
+M885AI6)I`*3W$`V%8J5A..57A6&E8N58I%@0`SCE]P9@)F$J!F`F82JD881O
+MA7"E7/!BI%N%]X7[2?\8:0&%^87]L?<X\?F%8;'[\?V%8J1:L?<X\?F%8+'[
+M\?T896&%8:5B:0"D]Q`-A6*E83CE6H5AI6+E6Z1;$`,XY?<&8"9A*@9@)F$J
+MI&&JF!AE;X5OBF5PA7"E7_!BI%Z%]X7[2?\8:0&%^87]L?<X\?F%8;'[\?V%
+M8J1=L?<X\?F%8+'[\?T896&%8:5B:0"D]Q`-A6*E83CE785AI6+E7J1>$`,X
+MY?<&8"9A*@9@)F$JI&&%8I@896^JI6)E<&"F;Z5P8*6PA6&&5H159F@0&J16
+ML:.%5[&EA5BQIX5:L:F%6[&KA5VQK85>I%7PU8B$5;%IT`B%8H5DA6;P)J2Q
+MA:])_QAI`85@L:\X\6"%8J2TL:\X\6"%9*2WL:\X\6"%9J15L6OP+Z2RA:])
+M_QAI`85@L:\X\6`896*%8J2UL:\X\6`8962%9*2XL:\X\6`896:%9J15L6WP
+M**2SA:])_QAI`85@L:\X\6`896*%8J2VL:\X\6`8962%9*2YL:\X\6"@`!AE
+M9A`!B*9H,`^D59%=I6216Z5BD5E,T9T895V%9IAE7H5GH@"@`*5D$`&(&&5:
+MA62895N%91`!RH9<H@"@`*5B$`&(&&57A6*895B%8Q`!RH99I6?P$DIF9D99
+M9F-F8D9<9F5F9*K0[J9FO0#*T"2D4X1OI%2$<:1EA7"%<DQ4GZ1BA&^D8X1P
+MI&2$<:1EA'),/I_)`?#II&*%]X7[2?\8:0&%^87]L?<X\?F%;['[\?VD8QAQ
+M]SCQ^85PI&2Q]SCQ^85QL?OQ_:1E&''W./'YA7*E4QAE;X5OD`/F<!BE5&5Q
+MA7&0`N9RO0#+\'B%]X7[2?\8:0&%^87]L?<X\?FJL?OQ_<"`D`+E]ZB*97&%
+M<9AE<H5RI&2Q]SCQ^;'[\?T897&D59'!I7)I`)'#I&.Q]SCQ^:JQ^_']P("0
+M`N7WJ(IE;X5OF&5PA7"D8K'W./'YL?OQ_1AE;Z15D;VE<&D`D;],T9VD5:5O
+MD;VE<)&_I7&1P:5RD<-,T9U)($A!5$4@0E)/0T-/3$DN("!!3D0@6454+"!)
+G3B!!($-%4E1!24X@4T5.4T4L($D@04T@0E)/0T-/3$DN("U424-+
+`
+end
+::::::::: tables.c000 :::::::::
+begin 644 tables.c000
+M`,``@`&"!(8)C!"4&9XDJC&X0,A1VF3N>020'*DVQ%+A<`"0(;)$UFG\D"2Y
+M3N1Z$:A`V'$*I#[9=!"L2>:$(L%@`*!!XH0FR6P0M%G^I$KQF$#HD3KDCCGD
+MD#SIED3RH5``L&$2Q'8IW)!$^:YD&M&(0/BQ:B3>F500S(E&!,*!0`#`@4($
+MQHE,$-297B3JL7A`"-&:9"[YQ)!<*?;$DF$P`-"A<D06Z;R09#D.Y+J1:$`8
+M\<JD?EDT$.S)IH1B02``X,&BA&9)+!#TV;ZDBG%80"@1^N3.N:20?&E61#(A
+M$`#PX=+$MJF<D(1Y;F1:44A`.#$J)!X9%!`,"08$`@$````!`@0&"0P0%!D>
+M)"HQ.$!(45ID;GF$D)RIML32X?``$"$R1%9I?)"DN<[D^A$H0%AQBJ2^V?00
+M+$EFA*+!X``@06*$ILGL$#19?J3*\1A`:)&ZY`XY9)"\Z19$<J'0`#!ADL3V
+M*5R0Q/DN9)K1"$!XL>HD7IG4$$R)Q@1"@<``0('"!$:)S!!4F=XD:K'X0(C1
+M&F2N^420W"EVQ!)AL`!0H?)$END\D.0YCN0ZD>A`F/%*I/Y9M!!LR2:$XD&@
+M`&#!(H3F2:P0=-D^I`IQV$"H$7KD3KDDD/QIUD2R(9``<.%2Q#:I')`$>>YD
+MVE'(0+@QJB2>&900C`F&!((!@`"``8($A@F,$)09GB2J,;A`R%':9.YY!)`<
+MJ3;$4N%P`)`ALD36:?R0)+E.Y'H1J$#8<0JD/MET$*Q)YH0BP6``H$'BA";)
+M;!"T6?ZD2O&80.B1.N2..>20/.F61/*A4`"P81+$=BG<D$3YKF0:T8A`^+%J
+M)-Z95!#,B48$PH%``,"!0@3&B4P0U)E>).JQ>$`(T9ID+OG$D%PI]L2283``
+MT*%R1!;IO)!D.0[DNI%H0!CQRJ1^6300[,FFA&)!(`#@P:*$9DDL$/39OJ2*
+M<5A`*!'ZY,ZYI)!\:59$,B$0`/#ATL2VJ9R0A'EN9%I12$`X,2HD'AD4$`P)
+M!@0"`0!`/S\^/CT]/#P[.SHZ.3DX.#<W-C8U-34T-#,S,C(Q,3$P,"\O+BXM
+M+2TL+"LK*RHJ*2DI*"@G)R<F)B4E)20D)",C(B(B(2$A("`?'Q\>'AX='1T<
+M'!P;&QL:&AH9&1D9&!@8%Q<7%A86%145%104%!,3$Q,2$A(2$1$1$1`0$!`/
+M#P\/#@X.#@T-#0T,#`P,#`L+"PL*"@H*"@D)"0D)"0@("`@(!P<'!P<'!@8&
+M!@8&!04%!04%!00$!`0$!`0$`P,#`P,#`P,"`@("`@("`@("`0$!`0$!`0$!
+M`0$!`0$`````````````````````````````````````````````````````
+M```````````````````````````````!`0$!`0$!`0$!`0$!`0("`@("`@("
+M`@(#`P,#`P,#`P0$!`0$!`0$!04%!04%!08&!@8&!@<'!P<'!P@("`@("0D)
+M"0D)"@H*"@H+"PL+#`P,#`P-#0T-#@X.#@\/#P\0$!`0$1$1$1(2$A(3$Q,3
+M%!04%145%186%A<7%Q@8&!D9&1D:&AH;&QL<'!P='1T>'AX?'Q\@("$A(2(B
+M(B,C)"0D)24E)B8G)R<H*"DI*2HJ*RLK+"PM+2TN+B\O,#`Q,3$R,C,S-#0U
+M-34V-C<W.#@Y.3HZ.SL\/#T]/CX_/T!`04%"0D-#1$1%149&1T=(2$E)2DI+
+M3$Q-34Y.3T]045%24E-35%155E975UA965I:6UQ<75U>7U]@8&%B8F-D9&5E
+M9F=G:&EI:FIK;&QM;FYO<'!Q<G)S='1U=G9W>'EY>GM[?'U]?G]_@(&"@H.$
+MA(6&AX>(B8J*BXR-C8Z/D)"1DI.3E)66EI>8F9F:FYR=G9Z?H*"AHJ.DI*6F
+MIZBIJ:JKK*VMKJ^PL;*RL[2UMK>WN+FZN[R]O;Z_P,'"P\3$Q<;'R,G*R\O,
+MS<[/T-'2T]34U=;7V-G:V]S=WM_@X>'BX^3EYN?HZ>KK[.WN[_#Q\O/T]?;W
+M^/GZ^_S]_O\```$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C
+M)"4F)R@I*BLL+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]0
+M45)35%565UA96EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]
+M?H"!@H.$A8:'B(F*BXR-CH^0D9*3E)66EYB9FIN<G9Z?H*&BHZ2EIJ>HJ:JK
+MK*VNK["QLK.TM;:WN+FZN[R]OK_`P<+#Q,7&Q\C)RLO,S<[/T-'2T]35UM?8
+MV=K;W-W>W^#AXN/DY>;GZ.GJZ^SM[N_P\?+S]/7V]_CY^OO\_?[_`/____[^
+M_OW]_?S\_/O[^_OZ^OKY^?GX^/CW]_?W]O;V]?7U]/3T\_/S\_+R\O'Q\?#P
+M\._O[^_N[N[M[>WL[.SKZ^OKZNKJZ>GIZ.CHY^?GY^;FYN7EY>3DY./CX^/B
+MXN+AX>'@X.#?W]_?WM[>W=W=W-S<V]O;VMK:VMG9V=@G)R8F)B4E)24D)"0C
+M(R,B(B(A(2$@("`@'Q\?'AX>'1T='!P<'!L;&QH:&AD9&1@8&!@7%Q<6%A85
+M%144%!04$Q,3$A(2$1$1$!`0$`\/#PX.#@T-#0P,#`P+"PL*"@H)"0D("`@(
+M!P<'!@8&!04%!`0$!`,#`P("`@$!`0``````````````````````````````
+M``$!`0$!`0$!`0$!`0$!`0$!`0$!`@("`@("`@("`@("`@("`@("`@("`P,#
+M`P,#`P,#`P,#`P,#`P,#`P,$!`0$!`0$!`0$!`0$!`0$!`0$!`4%!04%!04%
+M!04%!04%!04%!04%!08&!@8&^?GY^?GY^OKZ^OKZ^OKZ^OKZ^OKZ^OKZ^OKZ
+M^_O[^_O[^_O[^_O[^_O[^_O[^_O\_/S\_/S\_/S\_/S\_/S\_/S\_/W]_?W]
+M_?W]_?W]_?W]_?W]_?W]_?[^_O[^_O[^_O[^_O[^_O[^_O[^____________
+M______________\`#!DE,CY+5V1Q?8J6HZ^\R-7B[OL'%"`M.D937VQXA9&>
+MJ[?$T-WI]@,/'"@U04Y:9W2`C9FFLK_+V.7Q_@H7(S`]259B;WN(E*&NNL?3
+MX.SY!A(?*SA$45UJ=X.0G*FUPL_;Z/0!#1HF,T!,665R?HN7I+&]RM;C[_P)
+M%2(N.[C$T=WJ]@,0'"DU0DY;:'2!C9JFL[_,V>7R_@L7)#`]2E9C;WR(E:*N
+MN\?4X.WY!A,?+#A%45YK=X20G:FVPL_<Z/4!#AHG-$!-669R?XN8I;&^RM?C
+M\/P)%B(O.TA486YZAY.@K+G%TM_K^`01'2HW0U!<:76"CINHM,'-VN;S`-)I
+M1C0J(QX:%Q43$1`/#@T,"PL*"@D)"`@(!P<'!P8&!@8&!04%!04%!00$!`0$
+M!`0$!`0#`P,#`P,#`P,#`P,#`P,#`P,"`@("`@("`@("`@("`@("`@("`@("
+M`@("`@("`@("`@("`@$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!
+M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!
+M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0``````````````````````
+M`````````````````````````````````````(````"`````0%4`%X`G```@
+M6JL-@`",(<!F%,>`/@#&D%TM`-6MAV)`'P#BQJN1>&!),QX*]N31P*^?CX!Q
+M8U5(.R\B%PL`]>O@ULW#NK&IH)B0B(!X<6IC7%5/2$(\-C`J)1\:%`\*!0#[
+M]O+MZ>3@W-C3S\O(Q,"\N;6RKJNGI*&>FI>4D8Z+B(:#@'U[>'5S<&YK:69D
+M8E]=6UE65%)03DQ*2$9$0D`^/#HY-S4S,3`N+"LI)R8D(R$?'AP;&1@7%102
+M$1`.#0L*"0@&!00"`0#__OS[^OGX]_7T\_+Q\._N[>SKZNGHY^;EY./BX>#?
+MWMW<V]O:V=C7UM74U-,``!\R/TI265]E:6YR=GE\?X*%AXF,CI"2E)67F9J<
+MGI^@HJ.DIJ>HJ:JLK:ZOL+&RL[2TM;:WN+FZNKN\O;V^O\#`P<+"P\3$Q<;&
+MQ\?(R<G*RLO+S,S-SL[/S]#0T='2TM+3T]34U=76UM?7U]C8V=G9VMK;V]O<
+MW-W=W=[>WM_?W^#@X>'AXN+BX^/CY.3DY>7EY>;FYN?GY^CHZ.CIZ>GJZNKJ
+MZ^OK[.SL[.WM[>WN[N[N[^_O[_#P\/#Q\?'Q\O+R\O/S\_/T]/3T]/7U]?7V
+M]O;V]O?W]_?W^/CX^/GY^?GY^OKZ^OK[^_O[^_S\_/S\_/W]_?W]_O[^_O[_
+M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$"`@("`@("`@("`@("
+M`@("`@("`P,#`P,#`P,#`P,#`P0$!`0$!`0$!`0$!04%!04%!04&!@8&!@8&
+M!P<'!P<'"`@("`@("0D)"0H*"@H*"PL+"PP,#`P-#0T.#@X/#P\0$!`1$1$2
+M$A,3%!04%146%A<7&!@9&AH;&QP='1X>'R`A(2(C)"0E)B<H*2DJ*RPM+B\P
+M,3,T-38W.#H[/#X_0$)#149(24M-3E!25%9765M=7V)D9FAJ;6]R='=Y?'^"
+MA(>*C9"4EYJ>H:6HK+"TN+S`Q,C-T=;;W^3I[O3Y_OX!`0$!`0$!`0$!`0$!
+M`0$!`0$!`@("`@("`@("`@("`@("`@("`@("`@(#`P,#`P,#`P,#`P,#`P,#
+M!`0$!`0$!`0$!`0$!04%!04%!04%!@8&!@8&!@<'!P<'!P<("`@("`@)"0D)
+M"0H*"@H*"PL+"PP,#`P-#0T.#@X/#P\0$!`1$1$2$A(3$Q04%145%A87%Q@8
+M&1H:&QL<'1T>'A\@("$B(R,D)28G*"@I*BLL+2XO,#$R,S0V-S@Y.CP]/D!!
+M0T1&1TE*3$U/45-55EA:7%Y@8F5G:6MN<'-U>'I]@(.%B(N.DI68FY^BIJFM
+ML;6YO<'%RL[3U]SAYNOP]?H``@,%!@@)"PP.#Q`2$Q05%Q@9&AL;'!T>'A\?
+M'R`@("`@("`?'Q\>'AT<&QL:&1@7%103$A`/#@P+"0@&!0,"`/[]^_KX]_7T
+M\O'P[NWLZ^GHY^;EY>3CXN+AX>'@X.#@X.#@X>'AXN+CY.7EYN?HZ>OL[>[P
+J\?+T]??X^OO]_@`"`P4&"`D+#`X/$!(3%!47&!D:&QL<'1X>'Q\?("`@
+`
+end
+::::::::: table.rotmath :::::::::
+begin 644 table.rotmath
+M`(0````````````````!`0$!`0$!`0("`@("`@,#`P,$!`0$!04%!08&!@<'
+M!P@("`D)"0H*"PL+#`P-#0X.#P\0$!$1$A(3$Q04%146%Q<8&!D:&AL<'!T>
+M'A\@("$B(R,D)28F)R@I*2HK+"TN+B\P,3(S-#4U-C<X.3H[/#T^/T!!0D-$
+M149'2$E*2TU.3U!14E-45E=865I;75Y?8&)C)",C(B$@(!\>'AT<'!L:&AD8
+M&!<7%A45%!03$Q(2$1$0$`\/#@X-#0P,"PL+"@H)"0D("`@'!P<&!@8%!04%
+M!`0$!`,#`P,"`@("`@(!`0$!`0$!`0``````````````````````````````
+M`0$!`0$!`0$"`@("`@(#`P,#!`0$!`4%!04&!@8'!P<("`@)"0D*"@L+"PP,
+M#0T.#@\/$!`1$1(2$Q,4%!45%A<7&!@9&AH;'!P='AX?("`A(B,C9&-B8%]>
+M75M:65A75E134E%03TY-2TI)2$=&141#0D%`/SX]/#LZ.3@W-C4U-#,R,3`O
+M+BXM+"LJ*2DH)R8F)20C(R(A("`?'AX='!P;&AH9&!@7%Q85%104$Q,2$A$1
+M$!`/#PX.#0T,#`L+"PH*"0D)"`@(!P<'!@8&!04%!00$!`0#`P,#`@("`@("
+`0$!`0$!`0$```````````````@(
+`
+end
+........
+....
+..
+.                                    -fin-
+See you next issue!
+</code>