User Tools

Site Tools


magazines:chacking21

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

magazines:chacking21 [2015-04-17 04:34] (current)
Line 1: Line 1:
 +<code>
 +                   ########
 +             ##################
 +         ######            ######
 +      #####
 +    #####  ####  ####      ##      #####   ####  ####  ####  ####  ####   #####
 +  #####    ##    ##      ####    ##   ##   ##  ###     ##    ####  ##   ##   ##
 + #####    ########     ##  ##   ##        #####       ##    ## ## ##   ##
 +#####    ##    ##    ########  ##   ##   ##  ###     ##    ##  ####   ##   ##
 +#####  ####  ####  ####  ####  #####   ####  ####  ####  ####  ####   ######
 +#####                                                                    ##
 + ######            ######           Issue #21
 +   ##################              Feb 5, 2002
 +       ########
 +     Special Focus on Minigames
  
 +...............................................................................
 +
 + "Do not meddle in the affairs of wizards, for they are subtle and
 + quick to anger." -- Ancient Elvish saying
 +
 +...............................................................................
 +
 +BSOUT
 +
 +(Why yes, I _am_ re-reading Lord Of The Rings, how did you know?  Sure, the
 +movie is pretty good, not the same as the book, but probably the best you
 +could do, although why does every Hollywood evil creature essentially come
 +from Night of the Living Dead?  But hey, the elves actually looked like elves.
 +Black Hawk Down is really good too, and... oh, sorry!)
 +
 +Hoo-ah!  Welcome to another issue of C=Hacking!
 +
 +There's lots of nifty stuff to get to so this will be brief.  First, the
 +"Hacking Exchange" is now up at the Official Unofficial C=Hacking Homepage:
 +
 + http://www.ffd2.com/fridge/chacking/
 +
 +It's just a simple message board, with the idea that C=Hacking types can
 +make comments, ask questions, and otherwise talk to one another.  Check
 +it out!
 +
 +Second, if you have any projects you're working on... please contact me,
 +and consider writing them up for C=Hacking!
 +
 +And finally, something I think you'll enjoy:
 +
 +
 +http://groups.google.com/groups?hl=en&th=87c2a2ced7e32ce1&rnum=42
 +
 +From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu)
 +Subject: C= Hacker's Net-Mag
 +Newsgroups: comp.sys.cbm
 +Date: 1992-02-08 14:52:20 PST
 +
 +
 +  Due to the recent influx of "Tech-Know" (People who actually understand /
 +hack te C=64 into doing stuff that was previously unknown) I am going to
 +tentativly start a C= Hackers Net-Mag (Hacking in the 90's since of the word,
 +not the 80's).
 +  So - please send me your netmail address if you're interested in receiving
 +it.
 +  Or if you are interested in contatcting me concerning an article etc that
 +you'd like to see distributed please also contact me.
 +
 +For the first issue I'm tentatively planning on:
 +  - Line Drawing on 8563 VDC (640 x 200 hi-res graphics)
 +  - Beginning ML column
 +  - Raster Article
 +  . . .
 +
 +- Craig Taylor
 +duck@pembvax1.pembroke.edu
 +
 +P.S. - Not sure when the first is gonna go out but hopefully soon.
 +
 +
 +From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu)
 + Subject: Issue 1 - C= Hacking Available
 + Newsgroups: comp.sys.cbm
 + Date: 1992-02-27 17:22:44 PST
 +
 +
 +Issue 1 of C= Hackers is now available via NETMAIL and is a compilation of
 +several articles on the tehnical side of the Commodore 64 and 128.  For those
 +of you who missed the first posting, please reply via email and ask to be put
 +on the list.  The first issue had programming in ml, documented and
 +undocumented 6502 opcodes, and a line-drawing package in machine language for
 +the C=128 hi-res screen.
 +  Issue 2 will be coming out in a month or so.  Many thanks for the bandwith.
 +
 +- Craig Taylor
 +duck@pembvax1.pembroke.edu
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +::::::::::::::::::::::::::::::::::: Contents ::::::::::::::::::::::::::::::::::
 +
 +BSOUT
 + o Voluminous ruminations from your unfettered editor.
 +
 +Jiffies
 + o News, rumours, and stuff.
 +
 +
 +Side Hacking
 +
 + o "Pulse Width Modulation, continued" by various.
 +   Tying up some loose ends from last issue's digi article.
 +
 + o "Introducing Full-Screen IFLI mode with a SuperCPU", by Todd
 +   Elliot <eyethian@msn.com> (Hey, what's with all this MSN crapola? :)
 +   Using a SuperCPU, it is possible to use the first three columns
 +   of an (I)FLI picture, and Todd shows how.
 +
 +
 +Main Articles
 +
 + o "VIC-20 Kernel ROM Disassembly Project, part IV", by Richard Cini
 +   <rcini@msn.com>
 +
 +   And now it's time to start on that most frightening of creations:
 +   the tape drive code!
 +
 +
 + o "The Art of the Minigame" -- an article in six parts:
 +
 +   Introduction, by the editor
 +   Part 1: Codebreaker, by David Holz <whiteflame52@yahoo.com>
 +   Part 2: TinyPlay, by S. Judd <sjudd@ffd2.com>
 +   Part 3: MagerTris, by Per Olofsson <magervalp@cling.gu.se>
 +   Part 4: Compressing Tiny Programs, by S. Judd <sjudd@ffd2.com>
 +   Part 5: TinyRinth, by Mark Seelye <mseelye@yahoo.com>
 +   Part 6: Tetrattack!, by Stephen Judd <sjudd@ffd2.com>
 +
 +   
 +.................................. Credits ...................................
 +
 +Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd
 +C=Hacking logo by.......................... Mark Lawrence
 +
 +Special thanks to the cbm-hackers for many otherwise unacknowledged
 +contributions.
 +
 +Legal disclaimer:
 + 1) If you screw it up it's your own fault!  
 + 2) If you use someone's stuff without permission you're a dork!
 +
 +For information on the mailing list, ftp and web sites, send some email
 +to chacking-info@jbrain.com.
 +
 +
 +................................... Jiffies ..................................
 +
 +$01 Jochen Adler has made a program that reads the second side of
 +    a 1541 disk in a 1571 - without turning the disk over. It reads the
 +    blocks from end to beginning. Because of the mechanical bump however,
 +    it can only read tracks 5 to 35. If anybody wants this program please
 +    e-mail Jochen (NLQ@gmx.de)
 +
 +$02 Soci/Singular has been working on a commented C128 ROM listing.  Check
 +    out this great effort at http://singularcrew.hu/c128rom/
 +
 +$03 64net/2 has been updated:
 +
 +    For those that do not know: 64net/2 is yet another PC-to-C64/128 User<->LPT
 +    parallel cable software. It supports d64/d71/d81/t64/lnx/dhd disk images,
 +    raw files and own Internet partition. It is possible to enter disk images
 +    like any other directory. Client programs are provided for C64 and C128.
 +    The next goal will be to patch ROM instead of loading client.
 +
 +    A small BASIC example program is included that is able to send e-mails
 +    through 64net/2 host or any other machine if its IP is known. How many
 +    e-mail agents are there for C=? How many of them are written in plain
 +    BASIC 2.0? :)
 +
 +    http://sourceforge.net/projects/c64net/ contains the latest version, and
 +
 +    http://venus.wmid.amu.edu.pl/~ytm/64net2win.tar.gz contains a Windows
 +    binary.
 +
 +$04 CC65 is up to version 2.7.0, with many new improvements.  Check it out
 +    at http://www.cc65.org/
 +
 +$05 Moreover, Ullrich has set up his C64 as a web server at
 +
 + http://c64.cc65.org/
 +
 +    The web server runs on a stock 64, using a Swiftlink for communications,
 +    and uses the uIP TCP/IP stack written (with cc65) by Adam Dunkels
 +
 + http://dunkels.com/adam/uip
 +
 +    Pretty cool, eh?
 +
 +$06 A new IDE interface has become available:
 +
 +    Elysium is proud to announce a new software and hardware solution for
 +    your Commodore mass data storage needs.
 +
 +    The CIA-IDE is yet another approach to connect an IDE hard drive to 
 +    C64/128.  It differs from previous similar projects in these areas:
 +
 +    - it is the simpliest one (only two chips required (or just one in case of
 +      C128))
 +    - it is free (documentation and software),
 +    - the software is available in source code form under GNU GPL,
 +    - there _exists_ a ready to use GEOS 2.0 driver.
 +
 +    Documentation, source codes, and binaries are at:
 +
 + ftp://ftp.elysium.pl/groups/Elysium/Projects/ciaide/
 +
 +$07 Marko Makela has developed a tape drive emulator with an RS-232 port,
 +    allowing transfers between any computer with a tape port and any
 +    computer with an RS-232 interface (e.g. a PC or Swiftlink).  The hardware
 +    and software is at
 +
 + http://www.funet.fi/pub/cbm/crossplatform/transfer/C2N232/
 +
 +$08 GoDot -- the C64 image processing program -- is now public domain.  Arndt
 +    will still be working on it, but it's now available at
 +
 + http://members.aol.com/howtogodot/godnews.htm
 +
 +$09 The C64 is now listed in the Guiness Book of World Records; a scan of
 +    the page (from Robert Bernardo, posted by Frank Michlick) is at
 +
 + www.cupid.de/upload/famous.jpg
 +
 +    (love that line about "16K sound").
 +
 +    Also, I highly recommend taking Robert's advice and checking out the
 +    infinitely cool "Logo-Matic" on the main www.cupid.de site!
 +
 +$0A JOS just keeps cruising along, with lots of changes.  Among the biggest
 +    changes, of course, is the announcement that JOS will be merged with
 +    Clips, with the new system to be called "Wings" (with some of the letters
 +    capitalized and some not; I never remember that stuff).  For the latest
 +    JOS news, check out
 +
 + http://www.jolz64.cjb.net/
 +
 +$0B Frank Kontros has made a commented disassembly of the C64 ROM, with the
 +    BASIC ROM on the way:
 +
 + http://c64.uz.ua/sources/C64_Kernal_Disassembly.zip
 +
 +
 +$0C And finally, Aleksi Eeben, author of two minigames, has now written a
 +    VIC-20 game:
 +
 + http://www.student.oulu.fi/~aeeben/download/dragonwing.zip
 +
 +    more screenshots:
 +
 +        http://www.student.oulu.fi/~aeeben/screen1.png - screen4.png
 +
 +    Aleksi's homepage is at http://www.cncd.fi/aeeben
 +
 +    Neat!
 +
 +................................ Side Hacking ................................
 +
 +
 +Pulse Width Modulation, continued
 +--------------------------------- from various
 +
 +The digi article in issue #20 of C=Hacking left a few loose ends, and
 +generated some followups.
 +
 +First, Otto Jarvinen (sounddemon) emailed to say that the SID detection
 +routine occasionally reported incorrect results for him, and suggested that
 +a workaround was to do the detect several times.  YMMV!
 +
 +Second, a day or two after issue #20 was released, Levente discovered a
 +brilliant way to play 6-bit PWM digis on a stock machine:
 +
 +--
 +I couldn't resist, and tried something out (see attachment). It works!!! :-)
 +
 +In fact, when I wrote the last letter I didn't know that I found something
 +useable, just had some ideas - I felt that I'm at the right place. When I read
 +C=H 20 this morning and read your comment about the Test bit (from the PRG), I
 +knew that it must work. All I had to do is then to put this idea into code.
 +
 +The whole idea is about starting the pulse by software, and then having the
 +SID turn it back to 0 after a time.
 +
 +Is it possible? ...The keys are the Test bit (the SID wave counter can be
 +reseted anytime), the pulse width register, the wave counter and the SIDs way
 +of generating pulse wave. (Ie. the pulse wave is high, as long as the wave
 +counter is less than the value in the pulse width register).
 +
 +
 +Check this algorithm:
 +
 +- Init: volume at max, voice 1 sustain level max, start attack. Freq is
 +selected well (=$4000), so the wave counter is incremented by 4 every
 +processor clock cycles.
 +
 +Loop:
 +- load next sample value, and put it to the pulse width low register ($d402;
 +ensure that $d403 is 0).
 +- Set test bit, and clear test bit (counter reset).
 +- Increase sample pointer, some delay, then loop. The delay must be 64 clock
 +cycles + the time while the Test bit is kept set (4 cycles if using STA $d404
 +: STX $d404 immediately with pre-loaded values).
 +
 +What will happen? The 8-bit sample value is put directly to the pulse width
 +register (MSBs of the pulse width register are cleared!...). The wave counter
 +is started (release test bit), and it increases 4 by every CPU cycles (=
 +counts 256 in 64 cycles). After some time, the counter will reach the value in
 +the pulse width register. This happens in exactly after (8-bit sample value /
 +4) cycles, because of the above. In this cycle (or the next?...) the SID turns
 +its pulse output to 0. Voilá!
 +
 +One must just make sure that the loop length in cycles matches the above
 +conditions, and then it runs like hell... Since it does exactly the same on
 +the SID as the other (bit-banging) way, it just does it with some hardware
 +help, there's also no problem with the 4khz maximum barrier (since the
 +oscillator is reset every loop).
 +
 +With little enhancement, it's possible to write an about 7.5 bits player for a
 +stock C64 by this method. This is what you find in the attachment... The idea
 +is using all the 3 channels simultaneously. A slightly increased sample value
 +is written to the three pulse width registers, so the oscillators will finish
 +the duty cycle one processor cycle later, when there's a carry between
 +bits(0,1) to the MSBs.
 +
 +The replay freq is the CPU clk / 68 (~15khz). 64 cycles (variable duty cycle)
 ++ 4 cycles (constant duty cycle because of the reset time - no problems with
 +that, it doesn't change (just gives a small constant DC...)).
 +
 +By similar methods, it should be possible to write a sample player with higher
 +PWM freq (with less resolution of course, but eliminating this still audible
 +whistling).
 +
 +(I tried using the filter to reduce it, but it sounded so bad that I left it
 +out. It clicked like hell. The FETs got saturated.)
 +
 +[Richard Atkinson suggested turning down the sustain volumes to avoid this]
 +
 +See the attachment, and the binary. I think the sample sounds pretty good :-).
 +(The cut is from 'Greece 2000' by Three drives on a vinyl).
 +
 +(Another idea that popped up in my mind: since the TED sound generator can
 +also be reset, I could probably translate this idea to the Plus/4 :-O ).
 +
 +Best regards,
 +
 +Levente
 +
 +--
 +
 +The binary is available at http://www.ffd2.com/fridge/chacking/ towards the
 +bottom of the page.
 +
 +Third, I received a very interesting email from an Apple-II guy, which I'd
 +like to pass on:
 +
 +--
 +
 +Hi!
 +
 +I found your page as I was searching for something else 6502-related,
 +and was very interested.  Although I have always been aware of the
 +C64, I have never really been a user--I have used Apple II's since 1980.
 +
 +I was particularly interested in the article on playing "digis" on the
 +C64.  I became interested in playing digitized sounds on the Apple II
 +in 1993, after hearing a 3-bit, 11.025 KHz PWM player.  At 3 bits, you
 +can imagine how noisy speech samples were, but the overall effect
 +for a 1 MHz machine with a 1-bit speaker "toggle" was amazing.  It
 +made me wonder how far this PWM technique could be pushed on a
 +stock, 1 MHz Apple II (not the somewhat faster, 65816-based IIgs).
 +
 +The short answer is, much farther than I expected!  Robin and Stephen
 +accurately describe the theoretical PWM limit as 6 bit samples at
 +about 16 KHz for a stock 1 MHz machine, but, as they point out,
 +that is not practically realizable for a number of reasons, unless the
 +play loop is completely unrolled!
 +
 +Furthermore, in the Apple II world, sampled sounds have acquired a
 +few standardized sampling rates--mostly as a result of Mac influence,
 +which was in turn influenced by CD's.  The most common rate in the
 +Apple II world is 11.025 KHz, or one-fourth of the audio CD sampling
 +rate.  This is commonly considered to be "AM radio quality", with a
 +Nyquist bandwidth of about 5.5 KHz and a practical bandwidth of
 +4+ KHz, given practical anti-aliasing filters (at the sampling end, not
 +the playback end).
 +
 +A frequency of 11.025 KHz is, though high, still painfully audible to
 +people whose ears are not zonked--a piercing "squeal" running
 +through every sound.  So even though it is possible to write a
 +practical 6-bit 11.025 KHz PWM player (usually called a SoftDAC
 +in the Apple II world), the resulting listening experience is disappointing.
 +
 +So I went to work on a way to do 2x oversampling, and built a 5-bit
 +22.050 KHz PWM player.  It was sad to lose a bit, but the absence
 +of any audible "carrier" more than compensated for it!
 +
 +If you have access to an 8-bit Apple II (preferably with lower case,
 +like a //e), and also preferably with a way of attaching an external
 +speaker or headphones in place of the miserable 2.75" internal
 +speaker, then you can easily give it a try and judge for yourself.
 +
 +I'm pretty proud of the novel design of the code, which I would
 +characterize as "vectored" unrolled loops, one for every two
 +pulse duty cycles, which I wrote a BASIC program to write
 +for me--much less painful for counting cycles!
 +
 +The package is available on the web at:
 +
 +http://members.aol.com/MJMahon/index.html
 +
 +and is called <A HREF="http://members.aol.com/MJMahon/sound22.shk">Sound Editor v2.2</A>, since I had to "dress up" the player
 +into something fun to play with.  ;-)  An earlier version of Sound Editor
 +was published on SoftDisk in 1994, IIRC, but this one is a little more
 +evolved.  It also introduced 2:1 ADPCM compression of 8-bit sampled
 +sounds, to save disk space.  It is a lossy compression, but not very
 +noticeably.  The editor package also includes those routines, in 6502
 +assembly code.
 +
 +All of this should be trivially adaptable to the stock, 1 MHz C64, with
 +very good results.  By using the filters, you could probably filter out
 +the 11.025 KHz carrier and return to 6-bit accuracy!
 +
 +I should note that in the Apple world, sampled sounds are usually
 +represented as "excess-128" codes, which means that the sign bit
 +is inverted.  This actually simplifies things, since the sample value
 +is within a few shifts of being the pulse width in cycles.
 +
 +Let me know what you think!
 +
 +-michael
 +
 +--
 +
 +(Always great to hear from Atari and Apple ][ folks!)
 +
 +And finally, I have a little mathematical analysis of PWM and how it compares
 +to a "straight" digi.  Basically, I found some of the PWM explanations a
 +little unconvincing in issue #20 (even though I wrote them!).  For example,
 +the idea of "average voltage" seems a little funny, since every two samples
 +has an "average voltage", as does every four, etc. but that set of average
 +voltages would give a different sounding signal than the original (or
 +more dramatically, there is an average voltage over a full second of digi
 +playback, but that's not what you hear!).  So I wanted to know how a
 +PWM signal _really_ compares to a straight digi playback.
 +
 +Another issue is changing the amplitude of a PWM digi, i.e. using two
 +pulse waveforms, with one 1/16 the value of the other, to get higher
 +resolution.  If you recall the discussion of digis, the resolution of a PWM
 +digi depends on the number of pulse widths available, not the amplitude.
 +Adding two PWM waveforms together does not change the number of pulse widths
 +available, so I wanted to figure out what changing the amplitude _really_
 +does to a PWM digi, and if it can really be exploited.
 +
 +And finally, I wanted to know about the carrier wave (that is so piercing
 +at lower playback frequencies) -- and once again, how it compares with a
 +standard digi (which, after all, is stair-stepping the voltages at the
 +playback rate).
 +
 +Since the rest of this article is some Fourier analysis that 99% of people
 +will have zero interest in, I'll put the conclusions here.  The first is:
 +PWM digis and standard digis are essentially identical except at higher
 +frequencies (except for a phase shift, which doesn't make any difference to
 +your ear).  The second is: changing the amplitude of a PWM changes the 
 +resolution.  More specifically, the amplitude of the pulse multiplies the
 +digi sample value.  If two pulses can be synced close enough, it should
 +indeed be possible to use two pulses to get a higher resolution.  Moreover,
 +by modulating the amplitude of a single PWM digi, using the $d418 volume
 +register -- that is, using PWM _and_ $d418 -- it should be possible to get a
 +higher dynamic range, something that should be a little more achievable using
 +SID (but maybe not that useful, so I didn't try it out).  And finally, a
 +standard digi has zero amplitude at the carrier frequency.
 +
 +In other words, after a lot of effort I was able to demonstrate what everyone
 +already knows.
 +
 +The analysis doesn't change anything from the previous articles (except
 +possibly the idea for changing the PWM amplitude to get more dynamic range).
 +
 +And now, some Fourier analysis.  A standard digi just sets the voltage to
 +the sample value s_j, for a length of time dt (dt = 1/sample rate).  The
 +Fourier transform of a single sample s_j (occuring at time t_j) is
 +
 +
 + s_j [e^(-iw dt) - 1] * [e^(-iw t_j) / -iw]
 +  
 +
 +where w = angular frequency.  Since the above is a little hard to read, I'll
 +say it in words.  The first term is the sample value s_j, which scales
 +amplitudes at all frequencies.  The second term is due to the finite length
 +of the pulse (evaluating the Fourier integral at the boundaries), and
 +basically changes the phase of the transform.  The third term is like
 +sin(w)/w -- a sinusoid with decreasing amplitude as frequency increases.
 +So: the transform goes like sin(w)/w times the sample value, with some phase
 +effects thrown in (we'll get back to these in a moment).
 +
 +A PWM digi sets the duty cycle of a pulse to the sample value s_j, giving
 +a Fourier transform of
 +
 + [e^(-iw s_j dt) - 1] * [e^(-iw t_j) / -iw]
 +
 +Compare this with the earlier expression, and you'll see that the sample
 +value s_j has moved up in to the exponent of the "phase term" but that
 +they're otherwise the same.
 +
 +The first thing to do is to show that both expressions, PWM and standard,
 +reduce to the same thing -- that is, that a PWM and a standard digi sound
 +the same!  The expressions both decrease as 1/frequency, due to the
 +sin(w)/w term.  This means that at large frequencies the values become
 +negligible.  (How large?  For example, if the sample frequency is just 1KHz,
 +then sin(w)/w is .001 times smaller near w=1KHz (i.e. the sample frequency,
 +which is twice the Nyquist limit) than it is near w=0).
 +
 +So now consider the phase terms for small w.  The Taylor expansion for e^x is
 +
 + 1 + x + x^2/2 + ...
 +
 +We can therefore expand the "phase terms" as
 +
 + regular: e^(-iw dt) - 1 = (1 - iw*dt + w^2 dt^2/2 + ...) - 1
 + = -iw*dt + O(w^2 dt^2)
 +
 + pwm: e^(-iw s_j dt) - 1 = -iw*s_j*dt + O(w^2 dt^2)
 +
 +where O(w^2 dt^2) is considered very small since w and dt are both small.
 +Substituting the above into the original expressions gives
 +
 + s_j*iw*dt [e^(-iw t_j) / iw]
 +
 +in both cases.  That is, we have shown that for "small" frequencies -- more
 +specifically, for frequencies where (w^2*dt^2) is much smaller than (w*dt),
 +which is where w*dt<1, which is frequencies less than the sample frequency,
 +which is all frequencies of interest! -- PWM and standard digis are the same.
 +
 +The explanation lies in the phase terms.  Those "phase terms"
 +
 + [e^(iw dt) - 1]  (regular)
 +
 +and
 +
 + [e^(iw s_j dt) - 1]  (PWM)
 +
 +do more than just change the phase.  When they multiply the sin(w)/w signal,
 +they take the sin(w)/w signal, change the phase, and then subtract the
 +sin(w)/w signal again.  It's this difference of signals that makes things
 +work out at the frequencies we care about.  PWM and standard digis are _not_
 +the same, but the main differences are at higher frequencies, where the
 +amplitudes are in general much smaller.
 +
 +But... but... what about the PWM carrier frequency?  If we take a constant
 +digi, say with sample values = 1/2, the standard digi gives a constant
 +voltage, whereas a PWM digi gives a square wave at the sample frequency.
 +The answer comes from the "phase terms" above.  The sample frequency is
 +
 + w = 2*pi/dt.
 +
 +Substituting this into the phase terms gives
 +
 + [e^(i*2*pi) - 1] (regular)
 +
 +and
 +
 + [e^(i s_j 2*pi) - 1] (PWM)
 +
 +The regular expression is exactly zero -- there is _nothing_ at the
 +sample frequency of a regular digi.  But that's not the case for the PWM
 +term, because of the s_j up in the exponent.  PWM digis have a _finite_
 +amplitude at the carrier frequency.  Note that because of the sin(w)/w
 +term it gets smaller as the sample frequency increases -- but it isn't zero.
 +
 +Finally, the phase term expansions give some insight into what happens
 +when both the pulse width _and_ height are varied.  If the pulse width
 +is s_j, and the height is set to h_j, then the Fourier transform becomes
 +
 + h_j*s_j *iw*dt [e^(-iw t_j) / iw]
 +
 +That is, the amplitude multiples the width.  For the case of adding two
 +PWM waves together, then, the amplitude really does effectively scale the
 +sample value, and it should be possible to add one PWM value at 1/16 the
 +amplitude of another to get an effective 8-bit value.
 +
 +What about _varying_ the amplitude of a single PWM sequence?  For a 6-bit PWM
 +digi, say, the sample values s_j can go from 0 to 63.  If this is then
 +multiplied by h_j=2 say, then the values become 0 2 4 ... 126 -- a 7-bit
 +number where the lowest bit is always 0.  What use is that?  Well, we still
 +have the h_j=1 values of 0..63, which do include the lowest bit.  So we
 +can effectively change the dynamic range from 0..63 to 0..126 using just two
 +amplitude values.
 +
 +As a practical matter, then, it might be possible to use all 15 $d018 values
 +available to get a big dynamic range, and hence a better sounding digi,
 +using fewer CPU cycles.  Well, ok, we're only _sort of_ changing the dynamic
 +range, so I pretty much doubt the usefulness of it.  But maybe someone out
 +there would like to give it a shot.
 +
 +All right, let's hope this closes the book on pulse width modulation for
 +digi playback!
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +..............................................................................
 +
 +Introducing Full-Screen FLI mode for the SuperCPU
 +Copyright (C) 2002 By Todd S. Elliott
 +
 +The 'FLI Bug', where the first three columns of a FLI screen are essentially
 +unusable, can be squashed with the help of a SuperCPU.  I won't go into great
 +detail on IFLI, as it has been well-documented elsewhere, but I'll begin with
 +a short summary to get us all up to speed.  I refer you to Albert 'Pasi'
 +Ojala's excellent coverage of the FLI mode in C=Hacking #4. Pasi also
 +proofread this article.
 +
 +     A Three-Minute Summary of the FLI mode
 +
 +The VIC-II chip asserts a badline when it needs to access the databus and
 +fetch character data or videomatrix data. It was discovered that the VIC-II
 +chip can be manipulated by its vertical scroll register at $d011 (SCROLY) to
 +induce a badline at any given rasterline. By having a badline at every visible
 +rasterline, the program can manipulate $d018 (VMCSB) to point at the right
 +videomatrix to achieve the maximum flexibility of colors given to a
 +multi-color screen.
 +
 +Unfortunately, when a program forces a badline via SCROLY, the BA (Bus
 +Available) line in the computer goes high, and for three cycles the 6510/8510
 +processor has to finish its write operations or halt its read operations
 +before the BA line is released to the VIC-II chip. The maximum number of
 +successive write operations is three, hence the 3-cycle delay. It is in those
 +three cycles that the VIC-II does not fetch video matrix data to fill in the
 +first three columns and causes the 'FLI Bug'.
 +
 +I wish to stress that in those first three cycles, when the BA line is high,
 +the 6510/8510 processor is still active and can complete write operations. It
 +isn't fully shut down. After the badline retrigger at STA SCROLY, the code
 +following it is fetched on the databus and is ready to be executed by the
 +6510/8510 processor. When BA is high, the VIC-II will reference the value
 +on the databus as videomatrix data and display it in the first three columns
 +of the screen. The actual instructions that follow the STA SCROLY in the FLI
 +loop constitutes the video matrix data for the first three columns of the
 +screen.
 +
 +     Enter the SuperCPU!
 +
 + Normally, a VIC-II chip access is only possible every 4 cycles. The SuperCPU
 +can access the VIC-II chip in 1 cycle (1MHz) intervals, making cycle to cycle
 +changes possible within the VIC-II chip. More importantly, the SuperCPU
 +tristates the 6510/8510 processor inside the host Commodore computer (which
 +is a fancy way of saying that you can disconnect the processor from the
 +system without physically removing it).
 +
 +When a forced badline retrigger occurs with a STA SCROLY in a FLI loop under
 +the SuperCPU, the BA signal inside the host Commodore computer goes high. But,
 +the SuperCPU runs asynchronously and really doesn't have to pay attention to
 +the host Commodore as it runs code after the STA SCROLY. In fact, the SuperCPU
 +will execute code even if the VIC-II badline is in full swing inside the host
 +Commodore computer.
 +
 +I knew that the instruction opcodes left on the databus after the STA SCROLY
 +made up the video matrix data for the VIC-II chip for those first three
 +columns of the screen. But I wondered how this was possible in a SuperCPU
 +configuration because there would be no instruction opcodes left hanging on
 +the databus inside the host Commodore computer. After some discussions with
 +Per Olofsson ("MagerValp"), he suggested that writes/reads to the i/o area
 +will force a value to be put on the databus.
 +
 +This is where the magic begins, when the FLI loop forces the SuperCPU to write
 +to the i/o area of the host Commodore after the forced badline retrigger at
 +STA SCROLY. The SuperCPU will note that the BA signal is still high, so it can
 +still access the databus and stash values there via DMA. This BA high signal
 +will last for 3 cycles, enough for the SuperCPU to stash three values onto the
 +databus.
 +
 +The 6510/8510 is still tristated by the SuperCPU, and there's nothing on the
 +databus after the forced badline retrigger at STA SCROLY. Normally, the
 +6510/8510 CPU shares the databus with the VIC-II for each machine cycle. With
 +the 6510/8510 CPU out of the equation, the SuperCPU can stash a value onto
 +this shared bus on the CPU half of this machine cycle and the VIC-II chip will
 +see it in its other half of the machine cycle.
 +
 +However, the databus is only eight bits wide. The VIC-II chip fetches video
 +matrix data and color ram data 12 bits at a time. The SuperCPU can force
 +values onto the databus during the first three cycles after the forced badline
 +retrigger, but on each cycle the last four bits belonging to Color RAM would
 +not be fed to the VIC-II chip. Only pixel values of %10 and %01 can be
 +individually selected in multicolor FLI mode, while %11 pixel values cannot be
 +individually set for those first three columns of the screen. The high
 +resolution FLI mode does not suffer from this problem because it does not use
 +color RAM for color attribute information.
 +
 +     Full-Screen FLI in practice
 +
 +Let's get down to the nitty gritty. The Write I/O Approach requires three
 +200-byte tables, corresponding to each column. Each value on those tables
 +correspond to each visible rasterline. For example, the first byte of each
 +table corresponds to rasterline 50, the second byte of each table corresponds
 +to rasterline 51, etc. The first table contains values needed for the first
 +column of the screen, the second table contains values needed for the second
 +column of the screen, and the third table values for the third column.
 +
 +In the FLI display loop prior to the STA SCROLY command, the current
 +rasterline is used as an index to all three tables. The values are then
 +fetched from the tables and inserted into the code that follows the STA SCROLY
 +command using self-modifying code techniques. When the STA SCROLY happens, the
 +code that immediately follows it starts writing the values onto the databus,
 +all three in a row to complete the first three columns of the screen.
 +
 +There is a disadvantage with this approach. It requires that three 200-byte
 +tables be specially constructed and stored somewhere in memory that is not
 +mirrored by the SuperCPU. A routine would have to read in a FLI graphics file,
 +extract information from the first three columns and store it into their
 +respective 200-byte tables.
 +
 +Pasi Ojala came up with a graph depicting the SuperCPU interacting with the
 +VIC-II in action, showing what happens after the forced DMA retrigger at STA
 +SCROLY. The 'LDA #$xx' would have been modified earlier in the FLI routine
 +(before the STA SCROLY) using self-modifying code. Here is the relevant source
 +code which takes up 4 machine cycles inside the host Commodore computer.
 +
 +sta scroly : abcd, d = write Y to SCROLY on 1MHz bus CPU half - Mach. Cycle #1
 +lda #$00 : ef
 +sta $d022 : ghij, j = write 1 to $D022 on 1MHz bus CPU half - Mach. Cycle #2
 +lda #$00 : kl
 +sta $d022 : mnop, p = write 2 to $D022 on 1MHz bus CPU half - Mach. Cycle #3
 +lda #$00 : qr
 +sta $d022 : stuv, v = write 3 to $D022 on 1MHz bus CPU half - Mach. Cycle #4
 +
 +There are the two shared halves consisting of a machine cycle inside the host
 +Commodore bus, and by stashing values onto the databus, this value is carried
 +over to the VIC-II half and is read as videomatrix data during the first three
 +columns of the FLI screen.
 +
 +________ = VIC-II half of the 1MHz cycle
 +. = SCPU synchronizes to 1MHz bus
 +Each char position is equivalent to a 1 (20MHz) cycle.
 +
 +Mach. Cycle #1      Mach. Cycle #2      Mach. Cycle #3      Mach. Cycle #4
 ++-------------------+-------------------+-------------------+-------------------
 +__________YYYYYYYYYY___DMA____1111111111___col0___2222222222___col1___3333333333___col2___ 1MHz
 +abc.......ddddddddddefghi.....jjjjjjjjjjklmno.....ppppppppppqrstu.....vvvvvvvvvv SCPU
 +
 +Values on the databus which is carried over onto the VIC-II half of the databus:
 +DMA: DMA condition detected by VIC-II
 +col0: colors for column 0 read, gets the value 1 put into the bus by SCPU
 +col1: colors for column 1 read, gets the value 2 put into the bus by SCPU
 +col2: colors for column 2 read, gets the value 3 put into the bus by SCPU
 +
 +     An alternative approach bites the dust
 +
 +The SuperCPU can also fetch values onto the databus by reading from the I/O
 +region. If a coder were so inclined to use a 'Read I/O Approach', where is a
 +program going to find 600 free bytes in the i/o region at $d000-$dfff? The
 +idea is to force the SuperCPU to do a read on the databus via DMA and this
 +can't be done with mirrored locations similar to the ones used in those VIC
 +optimization modes. When a SuperCPU reads a value from mirrored memory, it
 +does so from its local RAM and not the RAM that is inside the host Commodore
 +computer. However, if the SuperCPU reads from the I/O block at $d000-$dfff, it
 +will read a value from inside the host Commodore computer using DMA.
 +
 +Unfortunately, this approach did not work when the BA line went high inside
 +the host Commodore computer and is unworkable for a full-screen FLI mode. The
 +SuperCPU stops for reads if BA is high, just like its 1MHz 6510/8510
 +counterpart.
 +
 +     Other Considerations.
 +
 +There were some interesting observations while debugging the full-screen FLI
 +routines. The full-screen FLI routines were originally inspired by Robin
 +Harbron's IRQ-based IFLI routines.  Because they are driven by an IRQ, the
 +CPU is still available for normal computational tasks.
 +
 +When all three videomatrix values are written to after the STA SCROLY in the
 +line-interruptible FLI routine, the IRQ must then exit quickly with the
 +restoration of the registers.  It's a good idea to avoid writing to any
 +mirrored location or read/write to any I/O region ($D000-$DFFF), since the
 +SuperCPU will have to wait for VIC to finish with the data bus.
 +
 +Using a raster IRQ will naturally lead to trouble, since cycle-exact timing
 +is needed, so a CIA timer is used.  The timer may be set to synchronize a
 +PAL or NTSC machine. Then in the FLI routine the timer can be checked and
 +indexed into a table of preset timing values so that the forced badline
 +retrigger at STA SCROLY will always happen at the right time on the screen,
 +no matter what the SuperCPU is doing when the VIC-II interrupted it with an
 +raster IRQ. Thanks goes to Ninja/The-Dreams (aka Wolfram Sang) for tips on
 +how to create a stable line-interruptible FLI routine using timers.
 +
 +    Source code
 +
 +Without further ado, here is the source code. This source code was used in the
 +Santa Claus FLI Demo for Wheels OS. This code will run in either Commodore 64
 +or 128 computers and in either PAL or NTSC systems. It did take a lot of
 +tweaking at Points #1, #2, #3, #4, & #5 as I tried to perfect the routines as
 +closely as possible. The full source code for the Santa Claus FLI demo can be
 +supplied via email upon request. It is in Concept+ (geoProgrammer) format.
 +
 +; Wedges the full-screen FLI interrupt handler in Wheels systems.
 +; Thanks goes to Robin Harbron for the idea of a line-interruptible FLI routine.
 +InstallFLI: ; Installs the FLI routine
 + jsr ClearMouseMode ; Turn off the mouse.
 + sei
 + lda CPU_DATA ; Save 6510/8510 Location #$01.
 + sta r6510
 + lda screenMode ; Check computer.
 + bne 2$ ; Take branch in 128 mode.
 + lda #IO_IN
 +.byte $2c
 +2$: lda #%00110111 ; for 128 mode only
 + sta CPU_DATA
 + lda vmcsb ; Save original video bank info for Wheels.
 + sta oVMBmp
 + lda scroly ; save screen Y axis
 + sta yaxis
 + MoveW $fffe, oldVector ; Saves the old Wheels IRQ vector.
 + lda #<fli ; Sets up fli raster.
 + sta $fffe ; At the IRQ interrupt vector.
 + lda #>fli
 + sta $ffff
 +; Point #1
 + lda #$31 ; Trigger the IRQ request
 + sta raster ; At rasterline 49.
 + lda scroly
 + and #$7f ; Clear bit 7 of raster register.
 + sta scroly
 + lda #1
 + sta vicirq ; Ack raster ints.
 + cli
 + rts
 +
 +RemoveFLI: ; Removes the FLI routine
 + sei
 + MoveW oldVector, $fffe ; Restores the old Wheels IRQ vector.
 + lda #$fb ; Trigger the IRQ request
 + sta raster ; At rasterline 251.
 + lda yaxis ; restore screen Y axis
 + and #$7f ; Clear bit 7 of raster register.
 + sta scroly
 + lda #1
 + sta vicirq ; Ack raster ints.
 + lda oVMBmp ; Restore original video bank info for Wheels.
 + sta vmcsb
 + lda r6510
 + sta CPU_DATA ; Restores 6510 Port #$01
 + cli
 + jmp StartMouseMode ; Start the mouse on.
 +fli: ; The actual FLI interrupt routine lies here.
 + pha
 +.byte $da ;phx
 +.byte $5a ;phy
 + php ; Save processor flags.
 +; Point #2
 + ldx #$03 ; #$0f for PAL SuperCPU systems.
 +3$ dex
 + bpl 3$
 + lda raster
 + tax
 + ldy colOneClrs,x ; Get colors for the first three columns.
 + sty mark4+1
 + ldy colTwoClrs,x
 + sty mark5+1
 + ldy colThreeClrs,x
 + sty mark6+1
 +; ldy backgndTable,x ; Get background color for scanline.
 +; stx $d021
 + inx
 +; Point #3
 + cpx #$f9 ; Have we reached scanline 249?
 + bne 1$
 +; Point #1
 + ldx #$31 ; Restart the IRQ at rasterline 49.
 +1$: stx raster ; By this time, the raster interrupt register is
 + ; incremented by one, and will re-trigger the
 + ; same fli routine.
 + ; This way, it is fully line-interruptible
 + ; and frees up SuperCPU time.
 + ldy #$01
 + sty vicirq ; Ack raster ints.
 + and #$07 ; Mask out lower three bits.
 + tax
 + ldy tabd018,x ; Use preset values for vmcsb.
 + lda d011tab,x ; Use preset values for scroly.
 + sty vmcsb ; Select video matrix.
 + sta scroly ; Forces the badline.
 +mark4: lda #$00 ; Stores a video matrix value onto the first column.
 + sta $d022
 +mark5: lda #$00 ; Stores a video matrix value onto the second column.
 + sta $d022
 +mark6: lda #$00 ; Stores a video matrix value onto the third column.
 + sta $d022
 + plp ; Restore processor flags.
 +.byte $7a ;ply
 +.byte $fa ;plx
 + pla ; Do NOT use any memory accesses to the host
 + rti ; Commodore databus in this part because it will
 + ; be blocked by the VIC-II badline.
 +
 +; Point #4
 +tabd018: ; Preset video matrix values.
 +.byte $78,$08,$18,$28,$38,$48,$58,$68; NTSC systems
 +;.byte $08,$18,$28,$38,$48,$58,$68,$78 - PAL systems
 +d011tab: ; Preset VIC DMA retrigger values.
 +.byte $38,$39,$3a,$3b,$3c,$3d,$3e,$3f
 +
 +ChkAbortKey: ; Checks the RUN/STOP keypress.
 + LoadB $dc00, #$7f ; check for the STOP key
 +3$: asl $dc01 ; Check for the RUN/STOP keypress.
 + ; This also synchronizes the line-interruptible FLI routine.
 + bcs 3$ ; Branch if it isn't pressed.
 + rts
 +prep3Cols: ; Prepares the first three columns of the FLI screen
 + ; Ideally, a FLI file would be loaded in and this
 + ; routine would then be called to set up the three
 + ; 200-byte tables corresponding to each column,
 + ; covering the first three columns of the screen.
 + lda #$40
 + sta mark1+2 ; Prepare the marks.
 + sta mark2+2
 + sta mark3+2
 + ldy #$00
 + sty mark1+1
 + iny
 + sty mark2+1
 + iny
 + sty mark3+1
 + php
 + sei
 + lda screenMode
 + beq 1$ ; take branch in 64 mode.
 + lda $ff00
 + pha ; save 128 configuration.
 + lda #%01111110 ; select RAM at $4000
 + sta $ff00
 +1$: ldy #$00
 + ldx #$07 ; Use self-modifying code to create three
 +; Point #5
 +mark1: lda $4000 ; 200-byte tables for each column of the
 + sta colOneClrs+49,y ; FLI screen and each value is indexed by
 +mark2: lda $4001 ; the scanline in the FLI routine.
 + sta colTwoClrs+49,y
 +mark3: lda $4002
 + sta colThreeClrs+49,y ; use +48 for the column offset in PAL
 + clc ; systems.
 + lda mark1+2
 + adc #$04
 + sta mark1+2
 + sta mark2+2
 + sta mark3+2
 + iny
 + dex
 + bpl mark1
 + sec
 + lda mark1+2
 + sbc #$20
 + sta mark1+2
 + clc
 + lda mark1+1
 + adc #$28
 + sta mark1+1
 + tax
 + inx
 + stx mark2+1
 + inx
 + stx mark3+1
 + lda mark1+2
 + adc #$00
 + sta mark1+2
 + sta mark2+2
 + sta mark3+2
 + cpy #200
 + bne mark1-2
 + lda screenMode
 + beq 2$ ; take branch in 64 mode.
 + pla
 + sta $ff00 ; restore 128 configuration.
 +2$: plp
 + rts
 +
 +.ramsect $1000
 +; All column colors are referenced by scanline.
 +colOneClrs: ; Column one colors of the FLI screen.
 +.block 256
 +colTwoClrs: ; Column two colors of the FLI screen.
 +.block 256
 +colThreeClrs: ; Column three colors of the FLI screen.
 +.block 256
 +
 +Hopefully the full-screen FLI possibilities that the SuperCPU can now unlock
 +will bring forth cool software for our SuperCPU's and tons of 'eye candy'.
 +
 +Enjoy.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 +
 + Main Articles
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 +
 +VIC KERNAL Disassembly Project - Part V
 +Richard Cini
 +February, 2002
 +
 +Introduction
 +============
 +
 + In Part 4 of this series, we discovered that of the 39 ROM routines
 +available to be called by user code, 26 of them related to device I/O. The
 +path to using a device from machine code was to first "open" it. So, we
 +looked at the routines required to open and begin using a logical device.
 +
 + When we concluded the discussion on the OPEN command, we left out
 +seven routines dealing with the tape device. This is some of the most
 +complex code I've ever seen. Unlike the IEEE serial peripherals, the tape
 +deck is a "dumb" device, meaning that the VIC Kernal has to do the work of
 +moving the data to/from the tape. With the serial peripherals, the Kernal
 +just sends a character to the device that then acts on it independently of
 +the VIC.
 +
 + After opening the device, there are nine non-tape routines to deal
 +with getting data into and out of the VIC, including talking and untalking,
 +listening and unlistening, moving the characters in and out, and some
 +secondary address stuff. 
 +
 + In this installment, we'll conclude our discussion of the OPEN
 +procedure by looking at the tape routines that are called from OPEN. Later
 +articles will discuss the actual movement of bits to and from the cassette
 +and the remaining IEEE serial routines, as well as the loading and saving of
 +files.
 +
 +Tape Routines
 +=============
 +
 + In the last installment, we examined the beginnings of opening a
 +device for use by a program. The reader will find that the routines are
 +convoluted and hard to follow, with jumps and branches into apparently
 +unrelated subroutines. Overall it appears to be ugly but highly functional
 +code.
 +
 + Here's a "calling tree" outline to the routines called by OPEN:
 +
 +IOPEN----
 + |-FIND (analyzed in Part IV of this series)
 + |-SENDSA (ultimately handles IEEE serial stuff)
 + |-SEROPN (handles RS232 stuff)
 + |
 + |(all tape-related from here down)
 + |-GETBFA (get address of tape buffer)
 + |-PLAYMS (prompt user to press PLAY on tape deck)
 + |   |-TPSTAT (tape key status)
 + |   |-TPSTOP (check for STOP keyboard key during tape ops)
 + |
 + |-SRCHMS (print "Searching" or "Searching for..." messages)
 + |-LOSCPH (locate a tape header with filename matching one in
 + | OPEN)
 + |-LOCTPH (locate first/next header; no filename specified)
 + |   |-SETBST (sets tape buffer start/end pointers)
 + |   |-PLAYMS (see above)
 + |   |-TPCODE (main tape code-moves bits in and out on IRQ)
 + | |-SBIDLE (serial buss idle check)
 + | |-STOIRQ1 (put key tape vectors into table)
 + | |-NCHAR (sets bit counter for char input operations)
 + | |-TPSTOP (see above)
 + | |-IUDTIM (update jiffy clock; previous installment)
 + |
 + |-RECDMS (prompt user to press PLAY & RECORD on tape deck)
 + |-WRTPHD (write a tape header to tape)
 + |   |-SETSBT (see above)
 + |   |-TPWRIT1 (precedes TPCODE by 12 bytes)
 +
 +
 + When dealing with the tape, it helps to understand that Commodore
 +built the tape protocol with an eye towards readability under adverse
 +conditions, including tape quality and motor speed. This made Commodore
 +tapes probably one of the most reliable data systems when compared to TI,
 +Apple, and Tandy, among others. Data headers and data blocks are repeated on
 +the tape and a comparison is made between the two reads to ensure integrity.
 +Additionally, the recording method is self-clocking so the effects of
 +varying tape speed are minimized.
 +
 + One of the first routines used in opening the tape device is
 +determining the address of the tape buffer and making sure that it's in the
 +$02xx page (or higher) of the system RAM. A test at IOPEN_S5 bails out with
 +an "illegal device" error if the tape buffer isn't just so.
 +
 +F84D ;=============================================================
 +F84D ; GETBFA - Get start of tape buffer
 +F84D ;    Returns buffer address in .X (LSB) and .Y (MSB)
 +F84D ;
 +F84D GETBFA
 +F84D A6 B2 LDX TAPE1
 +F84F A4 B3 LDY TAPE1+1
 +F851 C0 02 CPY #$02 ;is buffer at $02xx?
 +F853 60 RTS
 +
 + PLAYMS is called by IOPEN at IOPEN_S6. A test there determines if
 +we're trying to read/load or write/save from/to a tape. Read mode results in
 +the "Press Play..." message, the "Searching for..." message and two routines
 +that search for the appropriate tape header.
 +
 +F894 ;===============================================================
 +
 +F894 ; PLAYMS - Wait for tape key on read
 +F894 ;
 +F894 PLAYMS
 +F894 20 AB F8    JSR TPSTAT ;get tape key status
 +F897 F0 1C        BEQ TPSTEX ;$F8B5 pressed? yes, exit.
 +F899             
 +F899 A0 1B        LDY #KIM_PLAY ;offset for "Press Play..." message
 +
 +F89B PLAYMS1
 +F89B 20 E6 F1    JSR MSG ;print it
 +             
 +F89E WTPLAY
 +F89E 20 4B F9    JSR TPSTOP ; check for STOP key
 +F8A1 20 AB F8    JSR TPSTAT ;get key status
 +F8A4 D0 F8        BNE WTPLAY ;$F89E wait for PLAY switch
 +             
 +F8A6 A0 6A        LDY #KIM_OK ;offset for "OK" message
 +F8A8 4C E6 F1    JMP MSG ;print it and return to caller
 +
 + This simple routine prints the "Searching for..." message only if in
 +direct mode, and if appropriate, the filename that is being searched for. If
 +no filename is present, the message is changed to "Searching..." (without
 +the "for") before it's output to the screen.
 +
 +F647 ;==============================================================
 +F647 ; SRCHMS - Print "Searching for [filename]"
 +F647 ;
 +F647 SRCHMS
 +F647 A5 9D        LDA CMDMOD ;direct mode?
 +F649 10 1E        BPL SRCHEX ;$F669 no (programmed mode), exit
 +F64B             
 +F64B A0 0C        LDY #KIM_SRCH ;"Searching" message
 +F64D 20 E6 F1    JSR MSG ;output message
 +F650 A5 B7        LDA FNMLEN ;get filename length
 +F652 F0 15        BEQ SRCHEX ;$F669 no filename, skip "for"
 +F654 A0 17        LDY #$17 ;point to "FOR" in "Searching For"
 +F656 20 E6 F1    JSR MSG ;print it 
 +F659 ;
 +F659 ; FLNMMS - Print filename
 +F659 ;
 +F659 FLNMMS
 +F659 A4 B7        LDY FNMLEN ;get filename length
 +F65B F0 0C        BEQ SRCHEX ;$F669 no filename, exit
 +F65D A0 00        LDY #$00
 +F65F FLNMLP ; loop to print filename
 +F65F B1 BB        LDA (FNPTR),Y ;get character
 +F661 20 D2 FF    JSR CHROUT ; and print it
 +F664 C8          INY
 +F665 C4 B7        CPY FNMLEN
 +F667 D0 F6        BNE FLNMLP ;$F65F loop
 +F669 SRCHEX
 +F669 60          RTS ;exit
 +
 +
 + If the user is attempting to open a tape file with a specific
 +filename, the IOPEN code makes a call to LOCSPH in IOPEN_S6 to find the file
 +header associated with the filename. If the specific header is not found,
 +the routine emits the "File not Found" error message. LOCSPH is a loop
 +wrapper around the LOCTPH routine that searches for the "next" file header
 +regardless of name. The secondary address parameter of the OPEN command
 +defines whether the action is to (0) read a tape file and relocate it in
 +memory, (1) read a tape file without relocation (a machine program), or (2)
 +write a tape file and put both EOF an EOT markers after it. Once a header is
 +found, the filename from the header is compared to the filename in the OPEN
 +command to see if there's a match.
 +
 +F867 ;============================================================
 +F867 ; LOCSPH- Find specific tape header
 +F867 ;
 +F867 LOCSPH
 +F867 20 AF F7    JSR LOCTPH ;search for next header
 +F86A B0 1D        BCS LCSPEXC+1 ;$F889 returned EOT? Go to ready
 +             
 +F86C A0 05        LDY #$05 ;filename offset in header
 +F86E 84 9F        STY TPTR2 ;save offset 
 +F870 A0 00        LDY #$00 ;loop counter
 +F872 84 9E        STY TPTR1 ;save it
 +             
 +F874 LCSPHLP
 +F874 C4 B7        CPY FNMLEN ;compare filename length
 +F876 F0 10        BEQ LCSPEXC ;$F888 counter 0, exit
 +F878             
 +F878 B1 BB        LDA (FNPTR),Y ;get filename letter
 +F87A A4 9F        LDY TPTR2 ; offset to name in header
 +F87C D1 B2        CMP (TAPE1),Y ;compare to tape header
 +F87E D0 E7        BNE LOCSPH ;f867 different, get next header
 +
 +F880 E6 9E        INC TPTR1 ;increment counters
 +F882 E6 9F        INC TPTR2
 +F884 A4 9E        LDY TPTR1
 +F886 D0 EC        BNE LCSPHLP ;$F874 compare next character
 +F888 LCSPEXC
 +F888 18          CLC ; exit success
 +F889 60          RTS
 +
 +
 + Tape searches are performed linearly, so the LOCTPH routine is used
 +to search for the next file header on the tape starting from the current
 +tape position. This routine is also called by LOAD through IOPEN if no
 +filename is provided as a parameter to the LOAD (or OPEN) call.
 +Additionally, it's called in a loop by the LOCSPH routine when searching for
 +a specific file.
 +
 + LOCTPH reads a tape block to the tape buffer using TPREAD and
 +examines a few important fields in the file header. The fields examined
 +indicate the file type and the filename. If the file header indicates that
 +the file is anything other than a program file or a data file, the routine
 +looks for another file header until it reaches the end of the tape. If BASIC
 +is in "direct mode", LOCTPH prints the "Found" message in addition to the
 +filename.
 +
 +
 +F7AF ;============================================================
 +F7AF ; LOCTPH - Read any tape header
 +F7AF ;  Header type: 1=BASIC program, 2=data block, 3=machine program,
 +F7AF ; 4=data header, 5=end-of-tape marker
 +F7AF ;
 +F7AF LOCTPH
 +F7AF A5 93        LDA IOFLG2
 +F7B1 48          PHA ;save load/verify flag
 +F7B2 20 C0 F8    JSR TPREAD ;read tape block to buffer
 +F7B5 68          PLA
 +F7B6 85 93        STA IOFLG2 ;restore flag
 +F7B8 B0 2C        BCS LOCTPEX ;F7E6 error, end search
 +
 +F7BA A0 00        LDY #$00 ;index reg
 +F7BC B1 B2        LDA (TAPE1),Y ;get header type
 +F7BE C9 05        CMP #$05 ;EOT? 
 +F7C0 F0 24        BEQ LOCTPEX ;$F7E6 yes, exit
 +
 +F7C2 C9 01        CMP #$01 ;BASIC program?
 +F7C4 F0 08        BEQ LOCTP1 ;$F7CE yes, branch
 +
 +F7C6 C9 03        CMP #$03 ;ML program?
 +F7C8 F0 04        BEQ LOCTP1 ;$F7CE yes, branch
 +
 +F7CA C9 04        CMP #$04 ;data header?
 +F7CC D0 E1        BNE LOCTPH ;must be data block-skip it
 +
 +F7CE LOCTP1 ;program or data header comes here
 +F7CE AA          TAX
 +F7CF 24 9D        BIT CMDMOD ;direct mode?
 +F7D1 10 11        BPL LOCTPEX-2 ;$F7E4 no, continue
 +
 +F7D3 A0 63        LDY #KIM_FOUN ;setup for "Found" msg
 +F7D5 20 E6 F1    JSR MSG ;print it
 +F7D8 A0 05        LDY #$05 ;offset to file name
 +
 +F7DA LOCLOOP ; loop to print filename
 +F7DA B1 B2        LDA (TAPE1),Y ;print loop
 +F7DC 20 D2 FF    JSR CHROUT
 +F7DF C8          INY
 +F7E0 C0 15        CPY #$15 ;21 characters max
 +F7E2 D0 F6        BNE LOCLOOP ;$F7DA loop
 +F7E4 18          CLC
 +F7E5 88          DEY
 +
 +F7E6 LOCTPEX
 +F7E6 60          RTS
 +
 +
 + The TPREAD routine is responsible for the setup required to read or
 +verify a block from the tape. It prompts the user to press the PLAY button
 +on the tape deck, disables system interrupts and sets some important
 +parameters before execution falls through to the code responsible for
 +starting the tape IRQ routines. In many Commodore machines, tape operations
 +are performed under the operation of a routine installed as a temporary IRQ
 +handler -- sort of a cheap multitasking so that the system doesn't appear to
 +be hung while tape operations are occurring. Execution ultimately comes to
 +code at $F8EF (TPCODE) which is responsible for installing and starting the
 +tape IRQ routine.
 +
 + All of this and we haven't yet reached the bits on the tape :-)
 +
 +F8C0 ;==========================================================
 +F8C0 ; TPREAD - Read tape block
 +F8C0 ;
 +F8C0 TPREAD
 +F8C0 A9 00        LDA #$00
 +F8C2 85 90        STA CSTAT ;clear status variable...
 +F8C4 85 93        STA IOFLG2 ;and read/verif flag
 +
 +F8C6 TPREAD1
 +F8C6 20 54 F8    JSR SETBST ;set tape buffer pointers
 +
 +F8C9 ;
 +F8C9 ; load program
 +F8C9 ;
 +F8C9 TPREAD2
 +F8C9 20 94 F8    JSR PLAYMS ;wait for Play key
 +F8CC B0 1F        BCS TPCODE-2 ;$F8ED (in TPWRIT1)
 +F8CE             
 +F8CE 78          SEI ;disable interrupts
 +F8CF A9 00        LDA #$00 ;clear work memory for IRQ routines
 +F8D1 85 AA        STA RIDATA
 +F8D3 85 B4        STA BITTS
 +F8D5 85 B0        STA TPCON1
 +F8D7 85 9E        STA TPTR1
 +F8D9 85 9F        STA TPTR2
 +F8DB 85 9C        STA BYTINF
 +F8DD A9 82        LDA #$82 ;Timer H constant
 +F8DF A2 0E        LDX #$0E ;number for IRQ vector
 +F8E1 D0 11        BNE TPCODE1 ;$F8F4 (TPCODE1 in TPWRIT below)
 + ; falls through to TPWRIT below)
 +
 +
 + SRCHMS also calls this small routine to determine if a key is
 +pressed on the tape deck. TPSTAT looks at PA6 on VIA1 to determine the key
 +state and sets up the Z flag for a compare to be performed in SRCHMS. 
 +
 +F8AB ;============================================================
 +F8AB ; TPSTAT - Check tape key status
 +F8AB ;
 +F8AB TPSTAT
 +F8AB A9 40        LDA #%01000000 ;$40
 +F8AD 2C 1F 91    BIT D1ORAH ;switch sense
 +F8B0 D0 03        BNE TPSTEX ;$F8B5 not pressed, exit
 +F8B2 2C 1F 91    BIT D1ORAH ;button pressed.  Setup for another
 +F8B5              ;compare later. Z=1 if pressed
 +F8B5 TPSTEX
 +F8B5 18          CLC
 +F8B6 60          RTS ;return clear
 +
 +
 + One of the tests performed in IOPEN (also at IOPEN_S6) is to
 +determine if the tape operation is a read or a write. If we're in the write
 +mode, the code jumps to IOPEN2. At IOPEN2, the Kernal prompts for the user
 +to press play and record on the tape deck and then writes a file header by
 +calling WRTPHD with a control ID of $04 (a data header). Then, IOPEN writes
 +a control byte ID of $02 (block is a data block) to the tape buffer and
 +returns.
 +
 + RECDMS is called by IOPEN to determine if a key is pressed on the
 +tape deck and if not, sets the message flag to the "Press Play & Record"
 +message and prints the message by calling into the PLAYMS routine. PLAYMS
 +prints the message and then checks for the key press.
 +
 +F8B7 ;===========================================================
 +F8B7 ; RECDMS - Wait for tape key on write
 +F8B7  ;
 +F8B7 RECDMS
 +F8B7 20 AB F8    JSR TPSTAT ;get button status
 +F8BA F0 F9        BEQ TPSTEX ;$F8B5 pressed? Yes, continue
 +
 +F8BC A0 2E        LDY #KIM_RECP ;no, remind to "Press Play & Record"
 +F8BE D0 DB        BNE PLAYMS1 ;exit through $F89B 
 +
 +
 + IOPEN calls WRTPHD at IOPEN2 with $04 as the control byte (following
 +block is a data header) to be written into the header. WRTPHD then writes
 +some critical information into zero-page locations in advance of filling the
 +tape buffer with the same information. At the end of the routine, the data
 +is written to the tape in WRTPH1. 
 +
 +F7E7 ;==========================================================
 +F7E7 ; WRTPHD - Write tape header
 +F7E7 ;  On entry, .A is the header type: 2=data blk; 4=data hdr
 +F7E7 ;
 +F7E7 WRTPHD
 +F7E7 85 9E        STA TPTR1 ;save header type
 +F7E9 20 4D F8    JSR GETBFA ;get tape buffer address
 +F7EC 90 5E        BCC WRTPEX ;$F84C exit if not right
 +
 +F7EE A5 C2        LDA STAL+1 ; save some program info
 +F7F0 48          PHA ;save...start H
 +F7F1 A5 C1        LDA STAL
 +F7F3 48          PHA ;...start L
 +F7F4 A5 AF        LDA EAL+1
 +F7F6 48          PHA ;...end H
 +F7F7 A5 AE        LDA EAL
 +F7F9 48          PHA ;...end L
 +F7FA A0 BF        LDY #$BF ;buffer length-1 (191)
 +F7FC A9 20        LDA #$20 ; {space}
 +
 +F7FE WRTPLP1 ; write program data to tape buffer
 +
 +F7FE 91 B2        STA (TAPE1),Y ;clear buffer
 +F800 88          DEY
 +F801 D0 FB        BNE WRTPLP1 ;$F7FE
 +
 +F803 A5 9E        LDA TPTR1 ;get header type
 +F805 91 B2        STA (TAPE1),Y ;write it
 +F807 C8          INY
 +F808 A5 C1        LDA STAL ;get start L
 +F80A 91 B2        STA (TAPE1),Y ;write it
 +F80C C8          INY
 +F80D A5 C2        LDA STAL+1 ;get start H
 +F80F 91 B2        STA (TAPE1),Y ;write it
 +F811 C8          INY
 +F812 A5 AE        LDA EAL ;get end L
 +F814 91 B2        STA (TAPE1),Y ;write it
 +F816 C8          INY
 +F817 A5 AF        LDA EAL+1 ;get end H
 +F819 91 B2        STA (TAPE1),Y ;write it
 +F81B C8          INY
 +F81C 84 9F        STY TPTR2 ;save buffer offset
 +F81E A0 00        LDY #$00 ;filename loop counter
 +F820 84 9E        STY TPTR1 ;save loop counter
 +
 +F822 WRTPLP2 ; write filename to buffer
 +F822 A4 9E        LDY TPTR1 ;get loop counter
 +F824 C4 B7        CPY FNMLEN ;compare filename length
 +F826 F0 0C        BEQ WRTPH1 ;$F834 done
 +
 +F828 B1 BB        LDA (FNPTR),Y ;get filename char
 +F82A A4 9F        LDY TPTR2 ;get tape buffer pointer
 +F82C 91 B2        STA (TAPE1),Y ;write char to buffer
 +F82E E6 9E        INC TPTR1 ;increment loop counters
 +F830 E6 9F        INC TPTR2
 +F832 D0 EE        BNE WRTPLP2 ;$F822 loop
 +
 +F834 WRTPH1 ; flush buffer to tape
 +F834 20 54 F8    JSR SETBST ;set start and end address pointers
 +F837 A9 69        LDA #$69 ;checksum block ID
 +F839 85 AB        STA RIPRTY ;save parity character
 +F83B 20 EA F8    JSR TPWRIT1 ;$F8EA write block
 +F83E A8          TAY ;restore start and end addresses
 +F83F 68          PLA
 +F840 85 AE        STA EAL
 +F842 68          PLA
 +F843 85 AF        STA EAL+1
 +F845 68          PLA
 +F846 85 C1        STA STAL
 +F848 68          PLA
 +F849 85 C2        STA STAL+1
 +F84B 98          TYA
 +F84C             
 +F84C WRTPEX
 +F84C 60          RTS ;exit
 +
 +
 + SETBST is a helper routine called by LOCTPH and WRTPHD to setup the
 +tape buffer before a tape operation takes place. It sets the starting
 +address of the buffer to the first address of the assigned buffer range and
 +sets the end of the buffer to start + 192 bytes.
 +
 +F854 ;==========================================================
 +F854 ; SETBST - Set buffer start/end pointers
 +F854 ;
 +F854 SETBST
 +F854 20 4D F8    JSR GETBFA ;get buffer address
 +F857 8A          TXA
 +F858 85 C1        STA STAL ;start=start of buffer
 +F85A 18          CLC
 +F85B 69 C0        ADC #$C0 ;end=start+192
 +F85D 85 AE        STA EAL
 +F85F 98          TYA
 +F860 85 C2        STA STAL+1
 +F862 69 00        ADC #$00
 +F864 85 AF        STA EAL+1
 +F866 60          RTS
 +
 +
 + TPWRIT performs the nuts and bolts of moving cassette data in and
 +out of the VIC. The VIC moves tape data by using a series of interrupt
 +routines that are swapped into the IRQ vector as needed. The benefit here is
 +that the tape IRQ code is then executed 60 times per second, along with
 +normal processing, until the operation is complete, resulting in a cheap
 +form of multitasking.
 +
 +TPWRIT performs some setup chores before changing the IRQ vector, including
 +clearing interrupts, ensuring that the IEEE serial bus is idle, saving the
 +old vector, assigning the new vector and setting-up the variables used by
 +the tape IRQ routine to actually move bits. Finally, interrupts are enabled
 +at $F92E which starts the whole process. When the tape IRQ routine is
 +finished it restores the IRQ vector to the standard $EABF which is detected
 +by TPWRIT at TPCDLP2 (an I/O loop). When completed, TPWRIT exits through the
 +TPSTOP routine. This loop also updates the jiffy clock.
 +
 +F8E3 ;==========================================================
 +F8E3 ; TPWRIT - Initiate tape write
 +F8E3 ;
 +F8E3 TPWRIT
 +F8E3 20 54 F8    JSR SETBST ;get buffer pointers
 +F8E6 A9 14        LDA #$14 ;checksum
 +F8E8 85 AB        STA RIPRTY ;save it
 +
 +F8EA ;
 +F8EA ; write buffer to tape
 +F8EA ;
 +F8EA TPWRIT1
 +F8EA 20 B7 F8    JSR RECDMS ;wait for Record+Play keys
 +F8ED B0 68        BCS TPSTPX1 ;$F957 exit
 +F8EF ;
 +F8EF ; TPCODE - Common tape code
 +F8EF ;
 +F8EF TPCODE
 +F8EF 78          SEI ;disable interrupts
 +F8F0 A9 A0        LDA #%10100000 ;$A0 Timer H constant
 +F8F2 A2 08        LDX #%00001000 ;$08 offset for tape IRQ vector
 +
 +F8F4 TPCODE1
 +F8F4 A0 7F        LDY #%01111111 ;$7F disable interrupts
 +F8F6 8C 2E 91    STY D2IER ;save to interrupt enable reg
 +F8F9 8D 2E 91    STA D2IER ;  on VIAs  
 +F8FC 20 60 F1    JSR SBIDLE ;wait for timeout
 +F8FF AD 14 03    LDA IRQVP ;save current IRQ Vector
 +F902 8D 9F 02    STA TAPIRQ
 +F905 AD 15 03    LDA IRQVP+1
 +F908 8D A0 02    STA TAPIRQ+1
 +F90B 20 FB FC    JSR STOIRQ1 ;$FCFB .X=8 set tape IRQ vectors
 +F90E A9 02        LDA #$02 ;read # of blocks
 +F910 85 BE        STA FSBLK
 +F912 20 DB FB    JSR NCHAR ;set bit counter for serial shifts
 +F915 AD 1C 91    LDA D1PCR
 +F918 29 FD        AND #%11111101 ;$FD CA2 manual low
 +F91A 09 0C        ORA #%00001100 ;$0C force bits 2,3 high
 +F91C 8D 1C 91    STA D1PCR ;switch on tape drive
 +F91F 85 C0        STA CAS1 ;flag as on
 +F921 A2 FF        LDX #$FF ;delay loop for high (outer)
 +F923 A0 FF        LDY #$FF ;inner loop
 +
 +F925 TPCDLP1
 +F925 88          DEY
 +F926 D0 FD        BNE TPCDLP1 ;$F925
 +
 +F928 CA          DEX
 +F929 D0 F8        BNE TPCDLP1-2 ;$F923 outside loop
 +F92B 8D 29 91    STA D2TM2H
 +F92E 58          CLI ;allow tape interrupts
 +F92F ;
 +F92F ; wait for I/O-end
 +F92F ;
 +F92F TPCDLP2
 +F92F AD A0 02    LDA TAPIRQ+1 ;check IRQ direction
 +F932 CD 15 03    CMP IRQVP+1 ;standard vector?
 +F935 18          CLC
 +F936 F0 1F        BEQ TPSTPX1 ;$F957 yes, ready
 +
 +F938 20 4B F9    JSR TPSTOP ;no, check STOP key
 +F93B AD 2D 91    LDA D2IFR ;timer 1 IF clear?
 +F93E 29 40        AND #%01000000 ;$40
 +F940 F0 ED        BEQ TPCDLP2 ;$F92F continue
 +
 +F942 AD 14 91    LDA D1TM1L ; get timer 1 loword
 +F945 20 34 F7    JSR IUDTIM ;update RTC
 +F948 4C 2F F9    JMP TPCDLP2 ;$F92F loop
 +
 +
 + TPSTOP is another helper routine. It scans for the keyboard STOP
 +key, and if detected, restores the standard IRQ vector and returns to the
 +caller's caller (the caller to TPWRIT).
 +
 +F94B ;===========================================================
 +F94B ; TPSTOP - Check for tape stop
 +F94B ;
 +F94B TPSTOP
 +F94B 20 E1 FF    JSR STOP ;scan STOP key
 +F94E 18          CLC
 +F94F D0 0B        BNE TPSTPX ;$F95C not pressed, return
 +
 +F951 20 CF FC    JSR RESIRQ ;pressed, turn drive off and restore IRQ
 +F954 38          SEC ;set error flag 
 +F955 68          PLA ;pop return address
 +F956 68          PLA
 +
 +F957 TPSTPX1
 +F957 A9 00        LDA #$00 ;flag normal IRQ vector
 +F959 8D A0 02    STA TAPIRQ+1
 +
 +F95C TPSTPX
 +F95C 60          RTS ; return to caller's caller
 +
 +
 + The SBIDLE routine is used in TPWRIT to detect if the RS-232 serial
 +bus is active and if so, will wait until it's idle before returning to
 +continue the tape code.
 +
 +F160 ;===========================================================
 +F160 ; SBIDLE - Set timer for serial bus timeout
 +F160 ;
 +F160 SBIDLE
 +F160 48          PHA ;save .A
 +F161 AD 1E 91    LDA D1IER ;get IER
 +F164 F0 0C        BEQ SBIDLEX ;$F172 no interrupts pending, exit
 +
 +F166 SBIDLLP
 +F166 AD 1E 91    LDA D1IER ;get IER
 +F169 29 60        AND #%01100000 ;$60 T1 & T2
 +F16B D0 F9        BNE SBIDLLP ;$F166
 +F16D             
 +F16D A9 10        LDA #%00010000 ;$10 kill CB1 RS232
 +F16F 8D 1E 91    STA D1IER
 +
 +F172 SBIDLEX
 +F172 68          PLA
 +F173 60          RTS
 +
 +
 + RATS3 is at the tail end of the RAMTAS routine - the subroutine that
 +precedes the tape vectors.
 +
 +FCF6 ;===========================================================
 +FCF6 ; STOIRQ - Set IRQ vector
 +FCF6 ; usually called with .x=$08 or $0e. $08 points to the first
 +FCF6 ; entry in TAPVEC while $0e points to the last entry
 +FCF6 STOIRQ
 +FCF6 20 CF FC    JSR RESIRQ ;restore std IRQs
 +FCF9 F0 97        BEQ TPEOI ;$FC92
 +
 +FCFB STOIRQ1
 +FCFB BD E9 FD    LDA RATS3,X ;$FDE9,X set vectors from table
 +FCFE 8D 14 03    STA IRQVP
 +FD01 BD EA FD    LDA RATS3+1,X ;$FDEA,X
 +FD04 8D 15 03    STA IRQVP+1
 +FD07 60          RTS
 +
 + These are the actual routines responsible for writing bits to the
 +tape. Various calling routines place these vectors into the IRQ vector that
 +then gets executed 60 times per second. In order, these routines are for
 +writing a leader block to the tape, the routine to write data to the tape,
 +the standard IRQ vector, and the routine to read bits from the tape.
 +
 +FDF1 ;===========================================================
 +FDF1 ; TAPEVC - Tape IRQ vectors
 +FDF1 ;
 +FDF1 TAPEVC
 +FDF1 ; .dw $FCA8, $FC0B, $EABF, $F98E
 +FDF1 A8FC0BFCBFEA .dw WRLDR2, TWRD7, IRQVEC, RDTPBT
 +FDF7 8EF9
 +
 +
 + The NCHAR routine resets the internal bit counters to their initial
 +state before a calling routine begins to shift the bits over a serial or
 +tape channel. It sets the bit length to 8 and resets intermediate variables
 +to 0.
 +
 +FBDB ;===========================================================
 +FBDB ; NCHAR - Set bit counter for new character (serial output)
 +FBDB ;
 +FBDB NCHAR
 +FBDB A9 08        LDA #$08
 +FBDD 85 A3        STA SBITCF
 +FBDF A9 00        LDA #$00
 +FBE1 85 A4        STA CYCLE
 +FBE3 85 A8        STA BITCI
 +FBE5 85 9B        STA TPRTY
 +FBE7 85 A9        STA RINONE
 +FBE9 60          RTS
 +
 +
 + That's all of the room we have for this part of the series. All of
 +this code and we still haven't moved the bits off the tape. So, next time
 +we'll look at the actual routines responsible for moving bits on and off of
 +the tape and we'll begin to look at the IEEE serial routines.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 +
 +     -----------------------
 +     The Art Of The Minigame
 +     -----------------------
 +
 +Introduction
 +------------
 +
 +In summer 2001, an 8-bit minigame contest was held.  Voter turnout may have
 +been low, but author turnout was high, with a total of thirty entries for
 +the C64, Spectrum, Amstrad, and Atari 8-bit.  The entries and the results
 +are available at http://demo.raww.net/minigame/, and what follows are a
 +series of articles by several minigame authors.  The articles are for
 +enjoyment, to stimulate thought, and, hopefully, to motivate people for
 +next year's contest.  (Everyone had a great time, btw.)
 +
 +Minigames -- writing tiny programs in general -- present a set of very
 +unique challenges.  Whereas many of us are used to optimizing programs for
 +cycle-efficiency, optimizing programs for byte-efficiency turns out to be
 +very different -- challenging, at times aggrivating, but very rewarding.  I
 +think you will find the challenges and solutions ahead to be very interesting.
 +
 +The game authors were pioneers, exploring pretty new territory, and I
 +salute all that entered the contest (especially for making such a nice C64
 +showing).  A special mention should be made of MagerValp ("Skinny Puppy" in
 +Swedish, in case you've been wondering), who motivated a number of lazy
 +people (e.g. myself) to get involved with the contest.
 +
 +Before diving in to the articles ahead, which contain lots of tricks to
 +save memory, I thought it might get us in the mood to go over some general
 +ideas and concerns for saving bytes on a C64.
 +
 +It should be obvious that some balance has to be found between cycle-
 +efficiency and memory-efficiency.  Routines should be reasonably fast
 +in most cases, but the major portion of that balance is -- has to be --
 +memory efficiency.
 +
 +Obviously, everything that can be put into zero page should be put there,
 +since zero page instructions are two bytes instead of three.  Equally obvious
 +is that every list-type fetch should use the .X register, since there is
 +no zero-page lda xx,y instruction.
 +
 +Moreover, memory is initialized to various values when the computer is
 +reset.  Many zero-page locations have specific values -- by careful code
 +design, these can be taken advantage of to avoid any initialization code.
 +For example, many times the code needs a zero-page pointer with the low
 +byte zero (i.e. sta (point),y).  If that variable is chosen as a location
 +which is already zero, the code doesn't have to waste four bytes on
 +lda #00 sta point.  One-time counters can also be used in this way.
 +Finally, certain locations can be manipulated to have certain values;
 +for example, the Tetrattack load address forces the end address+1 to
 +be $10f0, because $10 and $f0 are the z-coordinates of two object vertices.
 +
 +There is also a fairly substantial kernal to take advantage of -- routines
 +to clear the screen, perform memory manipulations, and so on.  Knowing what
 +is available, and what the routines do, is handy.
 +
 +Self-modifying code comes in handy when large portions of code can be
 +used for similar things with just a few changes.  For example, the line
 +routine in Tetrattack uses a single routine for lines in both the x- and
 +y-directions, by just interchanging a few INX/INY instructions.
 +
 +And finally, there are things which Mark Seelye terms "injections", which
 +seems like a good term to me.  The idea is to reuse known register values
 +whenever possible.  That is, instead of having LDA #00 STA zp in a subroutine,
 +you might put an "STX zp" in an earlier subroutine, where you know the value
 +of .X is zero.  In other words, the instruction to clear the zp location is
 +"injected" into a different subroutine to save a couple of bytes.  Every
 +program below uses this trick in one form or another.
 +
 +You'll see lots of neat tricks in the articles ahead, but the basic design
 +framework is: put stuff in zero-page, take advantage of default zero-page
 +values, take advantage of the kernal, always know the register values, and
 +reuse as much code as possible.
 +
 +With that, let's check out the programs.
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 1===================================
 +
 +Postmortem:  Codebreaker
 +
 +by White Flame (aka David Holz)
 +http://white-flame.com
 +
 +The most difficult part of making an entry for this compo was deciding what
 +game to do that could fit in 512 bytes.  The ubiquitous snake clones and
 +scroll/dodge games abound; finding something different is truly a challenge.
 +I wanted to do a Mastermind game a looong time ago, with a full-on "We're
 +getting attacked, solve to code to save the world!" type of setting with lots
 +of animation, but usually found myself in application and library/utility
 +programming instead.  But luckily this idea popped back into my head as
 +something different that would fit in the 2-page limit.
 +
 +The concept is pretty simple (as most 512-byte games are).  There's a 4-color
 +secret key, you get 10 guesses at it.  Two numbers reflect the score for the
 +current guess: a black number shows how many slots have the right color, and a
 +white number shows how many correct colors are in the guess, but in the wrong
 +slot.  Experienced players should be able to deduce the correct answer in 4-6
 +guesses.
 +
 +
 +Random number generation
 +
 +There are 3 obvious ways to get random numbers easily in the 64: SID
 +oscillator with noise waveform, BASIC prng, or the raster counter.  I decided
 +to go for the raster solution.  Obviously, you need to wait a certain amount
 +of time between reads of the raster location, or they'd be the same or linear.
 +To fix this, every time I wanted a random number, I'd read the raster and call
 +a loop that many times.  When it was done, $d012 should have a "random enough"
 +number in its lower 3 bits (to get a color from 0 to 5).
 +
 +randomDelay:
 + ldx $d012
 +randomLoop:
 + jsr delay
 + dex
 + bne randomLoop;
 +delay:
 + dey
 + bne delay
 + rts
 +
 +
 +Code space saving
 +
 +What's interesting is that it actually took less bytes of code and data to
 +have custom draw loops for the 3 textual messages than setting a pointer and
 +calling a drawText function.
 +
 + ldy #22
 +titleloop:
 + lda titletext,y
 + sta screenLoc-48,y
 + dey
 + bpl titleloop:
 +
 +These functions assume that you're not running on an ancient kernal, so color
 +memory is initialized to the cursor color (white).
 +
 +I also had all variables (except the secret key) use screen memory, color
 +memory, or register .Y instead of manipulating off-screen variables and then
 +updating the screen to match:
 +
 +- For entering a guess, color memory is INC'd and DEC'd, with .Y holding the
 +  current offset from the fixed screen location.  JSR $e8ea is used to scroll
 +  the screen, and everything is drawn to screen line 23.
 +
 +- To find out if you've already made 10 guesses, a memory location near the
 +  top of the screen was polled to see if it was still a space.  As guesses are
 +  made, the screen was scrolled, and if the title line ever scrolled into that
 +  memory location, it's game over.
 +
 +- The current guess score '0 0' is drawn to the screen first, then those
 +  screen values are INC'd or DEC'd as matches are found.
 +
 +- Winning constitutes checking to see if the screen has a '4' in the black
 +  score location.
 +
 +
 +Score calculation
 +
 +This was a bit of a pain.  At first I trying to calculate the white score by
 +the scoring rules (every matching color that was in the wrong slot), INCing
 +the white score on every hit, then looping for exact matches, INCing the black
 +score on every hit.  This code ended up being way too big, so I got the idea
 +to simplify the white loop: INC the white score on every color match, whether
 +in the right spot or not, then on every black match, DEC the white score and
 +INC the black score.  This gave the same result, but with much simpler code.
 +
 +
 +Conclusion
 +
 +All in all, it was an interesting challenge for me, and had I not gotten stuck
 +in Phoenix for an extra week, I'd have finished my 2k entry.  The code is
 +pretty straightforward, and at 415 bytes total, I'm quite happy to claim
 +having the smallest entry, even if I didn't win (why Pacman didn't beat out a
 +low-res scroll/dodge/single-shot game is a mystery to me).  Being a hobbyist
 +C64 programmer, and my career history being more on the research end of
 +things, having a deadline is a nice change in terms of getting a project
 +completed. :)
 +
 +
 +Publisher:  Gasman/Raww Arse
 +Number of full-time developers:  1 programmer
 +Number of contractors:  0
 +Estimated budget:  0 USD (0 EUR)
 +Development time:  About 5 hours
 +Release date:  August 22, 2001
 +Platforms:  Commodore 64 and compatibles
 +Development hardware:  pee sea (1.4GHz T-bird, 512MB RAM, 100GB HD, Win98SE)
 +3rd party tools:  xa, Vice, Notepad
 +In-house tools:  none
 +Project size: 290 lines of 6502 assembly code, containing 1 line of tokenized
 +BASIC
 +
 +--- cb.asm ---
 +.word $0801 ;Starting address for loader
 +
 +* = $0801
 +
 +.word nextLine  ;Line link
 +.word $0 ;Line number
 +.byte 158 ;SYS
 +.byte 50,48,54,54 ;"2066"
 +.byte 0
 +nextLine .byte 0,0 ;end of basic
 +
 +screenGameOver = $0400+(4*40)+16
 +screenLoc = $0400+(23*40)+16
 +colorLoc = $d800+(23*40)+16
 +screen = $fd
 +color = $fb
 +peg = 81
 +cursor = 87
 +crsr_up = 145
 +crsr_down = 17
 +crsr_left = 157
 +crsr_right = 29
 +
 +WaitStart:
 +jsr $ffe4
 +beq WaitStart
 +
 +Start:
 +
 +;Change to white
 +lda #$05
 +jsr $ffd2
 +
 +;Clear screen
 +lda #147
 +jsr $ffd2
 +
 +;Set bg colors
 +lda #$00
 +sta $d020
 +sta colorLoc
 +sta colorLoc+1
 +sta colorLoc+2
 +sta colorLoc+3
 +sta $ff ;for cursor
 +lda #11 ;dark gray
 +sta $d021
 +
 +;Init zp pointers
 +lda #<screenLoc
 +sta screen
 +sta color
 +lda #>screenLoc
 +sta screen+1
 +lda #>colorLoc
 +sta color+1
 +
 +;Draw text
 +ldy #22
 +titleloop:
 + lda titletext,y
 + sta screenLoc-48,y
 + dey
 + bpl titleloop:
 +
 +;Randomize values
 +
 +iny
 +newGuess
 +sty $02
 +guessAgain
 + jsr randomDelay
 + lda $d012
 + and #$07
 + cmp #$06
 + bpl guessAgain
 + ldy $02
 + checkLoop:  ;See if the color's already in the key
 + dey
 + bmi goodGuess
 +  cmp secret,y
 +  beq guessAgain
 + bne checkLoop
 +
 + goodGuess:  ;Save the color and put a peg on the screen to show progress
 + ldy $02
 + sta secret,y
 + lda #peg
 + sta (screen),y
 + iny
 + cpy #$04
 + bne newGuess
 +
 +Game_Loop:
 +
 +;Get cursor pos
 +ldy $ff
 +
 +;Draw cursor
 +lda #cursor
 +sta (screen),y
 +
 +Key_Loop:
 +
 +;Wait for key
 +getin: jsr $ffe4
 +beq getin
 +ldy $ff
 +
 +;If up, bump color up
 +cmp #crsr_up
 +bne notUp
 + lda (color),y
 + and #$07
 + adc #$00 ;carry is always set after the cmp
 + cmp #$06
 + bne colorNotHigh
 +  lda #$00
 + colorNotHigh:
 + sta (color),y
 +notUp:
 +
 +;If down, bump color down
 +cmp #crsr_down
 +bne notDown
 + lda (color),y
 + and #$07
 + bne colorNotLow
 +  lda #$06
 + colorNotLow:
 + sbc #$01 ;carry is always set after the cmp
 + sta (color),y
 +notDown:
 +
 +;If left,
 +cmp #crsr_left
 +bne notLeft
 + ; Erase cursor
 + lda #peg
 + sta (screen),y
 + ; Bump cursor left
 + dey
 + bpl cursorOKLeft
 +  ldy #$03
 + cursorOKLeft:
 + ; Draw cursor
 + lda #cursor
 + sta (screen),y
 +notLeft:
 +
 +;If right,
 +cmp #crsr_right
 +bne notRight
 + ; Erase cursor
 + lda #peg
 + sta (screen),y
 + iny
 + cpy #$04
 + bne cursorOKRight
 +  ldy #$00
 + cursorOKRight:
 + lda #cursor
 + sta (screen),y
 +notRight:
 +
 +;Store cursor pos
 +sty $ff
 +
 +;If not enter, goto Key_Loop
 +cmp #13
 +bne Key_Loop
 +
 +;------------------------
 +
 +;Erase cursor
 +lda #peg
 +sta (screen),y
 +
 +;Check score
 +
 +;Draw "0 0"
 +lda #'0
 +sta screenLoc+5
 +sta screenLoc+7
 +lda #0 ;black
 +sta colorLoc+5
 +
 +;Calculate white score (any matches between the guess & key)
 +ldy #3
 +yLoop:
 + ldx #3
 + xLoop:
 +  lda (color),y
 +  and #$07
 +  cmp secret,x
 +  bne noWhiteMatch
 +   inc screenLoc+7
 +  noWhiteMatch
 +  dex
 +  bpl xLoop
 + dey
 + bpl yLoop
 +
 +;Calculate black score (exact matches between the guess & key)
 +ldy #3
 +blackLoop:
 + lda (color),y
 + and #$07
 + cmp secret,y
 + bne noBlackMatch
 +  inc screenLoc+5
 +  dec screenLoc+7
 + noBlackMatch:
 + dey
 + bpl blackLoop
 +
 +;Check win (black score = "4")
 +lda #'4
 +cmp screenLoc+5
 +beq Win
 +
 +;Check for Game Over (screen's getting full)
 +lda screenGameOver
 +cmp #32
 +bne Game_Over
 +
 +;Scroll screen up 2 rows
 +jsr $e8ea
 +jsr $e8ea
 +
 +;Copy last hand
 +ldy #3
 +copyLoop:
 + lda colorLoc-80,y
 + sta (color),y
 + lda #peg
 + sta (screen),y
 + dey
 + bpl copyLoop:
 +
 +jmp Game_Loop
 +
 +Game_Over:
 +jsr $e8ea
 +ldy #8  ;Show "YOU SUCK!"
 +loseloop:
 + lda losetext,y
 + sta (screen),y
 + dey
 + bpl loseloop:
 +
 +jsr $e8ea
 +ldy #3  ;Reveal the secret key
 +losecopy:
 + lda secret,y
 + sta (color),y
 + lda #peg
 + sta (screen),y
 + dey
 + bpl losecopy
 +jmp WaitStart
 +
 +Win:
 +jsr $e8ea
 +ldy #15  ;Show Zero Wing reference
 +winloop:
 + lda wintext,y
 + sta screenLoc-4,y
 + dey
 + bpl winloop
 +rts
 +
 +randomDelay:
 +ldx $d012  ;Loop n times, n = current raster location
 +randomLoop:
 + jsr delay
 + dex
 + bne randomLoop;
 +
 +delay:
 +dey
 +bne delay
 +rts
 +
 +wintext
 +.byte 1,32,23,9,14,14,5,18,32,9,19,32,25,15,21,33
 +losetext
 +.byte 25,15,21,32,19,21,3,11,33
 +titletext .byte 58,58,58,32,3,54,52,32,3,15,4,5,2,18,5,1,11,5,18,32,58,58,58
 +secret = *
 +--- end cb.asm ---
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 2===================================
 +
 +Tinyplay by SLJ
 +--------
 +
 +Like many people, I was a serious Ultima guy growing up.  Not only did
 +I love the games (and still do), but I absolutely loved the music (and still
 +do).  In fact, I think the music from III and IV is some of the finest,
 +most musical composing ever done on the C64, and the best example of how
 +appropriate, atmospheric music can add volumes (so to speak) to a game.
 +
 +It also turns out that I had been thinking about some new ideas for a new
 +music system for a while.  So when Robin said he was writing an Ultima-like
 +game -- well!  I'm certainly no Kenneth W. Arnold, but a set of tiny tunes
 +sure sounded like a neat and fun thing to do, especially in the Ultima style!
 +Robin removed a lot of neat features and text to add in the music, so I feel
 +pretty honored to have been included.
 +
 +The player and tunes were written on pretty short notice; I think we started
 +a week or so before the deadline.  Moreover, I had to go out of town the
 +weekend of the deadline, getting back that Sunday, where many of the
 +optimizations took place in frantic coding sessions!  So it shouldn't be
 +surprising that the code is not as optimal as it could be -- if you get
 +around to looking at the source code, you'll see an awful lot of weird,
 +duplicate, and otherwise embarrasing lines of code which anyone with a clear
 +head would never have used.  But as is, the player and three tunes take up
 +some 428 bytes (I think), which isn't too bad.  (Later on I'll describe
 +a music compression routine which would have worked great, but that I didn't
 +have time to implement.)
 +
 +Broadly speaking, the player is a note-duration based player, using two
 +sawtooth voices for that Ultima sound (originally, before reality kicked in,
 +the thinking was to put some sound fx in the third voice).  It uses a single
 +play routine, using the .X register to tell which voice to play, and can
 +play multiple tunes.  The original version has several effects included like
 +major and minor arpeggios, but memory constraints forced these to be taken
 +out (along with the original tune, alas, alas).
 +
 +The full source code for the player is at the end of this article, and can be
 +divided into four primary parts: a one-time init routine, a routine to select
 +a tune, a routine to play notes, and the music data.  But perhaps the best
 +way to understand the player is to begin by looking at how the data for a
 +tune is stored.
 +
 +
 +Tune encoding
 +-------------
 +
 +Here is the third, "castle" tune, in its entirety:
 +
 +* Tune 3: Castle
 +
 +t2v1     
 +         dfb 97           ;gate
 +         dfb 24+$80-1
 +         dfb 70,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 12+$80-1
 +         dfb 74,74,96,75
 +         dfb 24+$80-1
 +         dfb 77,75,74,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 24+$80-1
 +         dfb 70,96,96,96
 +         dfb 00
 +t2v2     
 +         dfb 97
 +         dfb 24+$80-1
 +         dfb 34,41,46,41
 +         dfb 34,41,46,41
 +         dfb 58,62,53,57
 +         dfb 58,53,46,96
 +         dfb 0
 +
 +end      
 +
 +
 +
 +The first chunk of data (t2v1) is the voice 1 data, and the second chunk the
 +voice 2 data (t2v2).
 +
 +There are 12 notes in an octave, and eight octaves total, for 96 possible
 +notes.  In the player, data bytes 0-95 represent these notes -- the player
 +just uses the data as an index into a frequency table.  This leaves bytes 96
 +and above free for other uses; also, since the lower note values are rarely
 +used, values like 0, 1, etc. are also free in principle.
 +
 +Byte values >= 128 are used to specify the note duration.  When the player
 +encounters these byte values it simply strips the high bit and stores the
 +result.  When a new note is read in, this stored duration is placed into a
 +counter which is decremented on each player call.
 +
 +In the castle tune above, the duration for voice 2 is only set once, at the
 +very beginning -- every note has the same duration.  In the voice 1 data
 +it is set several times, to change the duration when appropriate.
 +
 +For some reason the code decrements the duration counter until it becomes
 +negative, instead of decrementing it until it becomes zero.  I'm sure it was
 +because of something relevant in an earlier version of the player, I just
 +totally don't remember what.  But that is why the durations above are encoded
 +as 24+$80-1 -- the -1 is to make it go negative, instead of to zero.  Looks
 +weird.
 +
 +Byte values >95 and <128 are available for 'special' features.  The player
 +supports one "effect": the gate toggle.  The player can either leave the gate
 +bit on all the time or can turn it off right as a note is about to finish
 +(say, when the duration counter decrements down to 2).  A byte value of 97
 +turns the gate toggle on -- in the castle tune above it is the first byte
 +of data.  In other tunes, a value of 100 is used to turn off gate toggling.
 +(Historically speaking, values 98 and 99 were used for major and minor
 +arpeggios, and 100 was for no fx at all.)
 +
 +The byte value 96 is used as a "rest" (or hold) -- the player resets the
 +duration counter but does not reset the gate bit or the note value.  When
 +there is no gate toggling this provides a way of holding a note for longer
 +than the basic duration.  When there is gate toggling, it starts the release
 +cycle, and lets notes fade away.  In the castle tune above, the first two
 +notes in voice 1 are "70 96" -- a note, then a rest, during which the note
 +fades away.
 +
 +Finally, the value 0 is used to indicate the end of data.  While 0 is
 +technically a note it never gets used as such.  When the player hits a zero
 +it simply resets the data pointer.
 +
 +So with all that in mind, let's take a look at the player.
 +
 +
 +The Player
 +----------
 +
 +Before playing a tune, a one-time routine is called which sets up the
 +frequency tables; when the player reads a new note, it looks up the SID
 +frequency settings in this table, along the lines of:
 +
 + ldy note,x
 + lda freqlo,y
 + sta $d400
 + lda freqhi,y
 + sta $d401
 +
 +Frequencies in different octaves are related by powers of 2.  When you go
 +up an octave, the frequency doubles -- for example, the frequency of C-5
 +is twice the frequency of C-4.  The routine starts with a table of twelve
 +frequency values for the highest octaves, and copies those values into
 +the end of the frequency table (e.g. freqlo+95, freqlo+94, ..., freqlo+84).
 +After copying, it divides each value by 2 and repeats the process -- now
 +the frequencies correspond to the next lower octave -- and this is done until
 +the frequency table is full.  Piece of cake.
 +
 +Once the frequency table is set up, the main play routine is called with
 +the tune number in .A.  If the tune number is different than the current
 +tune, the player simply resets the music data pointers and note durations,
 +and falls through to the main player routine.
 +
 +And the main player routine simply decrements the duration, and when it
 +becomes negative reads in the next data.
 +
 +Because the process -- decrement duration, read next data, store frequencies
 +in SID -- is the same for each voice, a single routine is used for both voices
 +one and two, using .X as the voice index.  That is, instead of
 +
 + dec duration
 +
 +the code can use
 +
 + dec duration,x
 +
 +and instead of
 +
 + sta $d400
 +
 +the code can use something like
 +
 + ldy SidOffset,x
 + sta $d400,x
 +
 +where .X is 0 or 1.
 +
 +And that's about all there is to it -- the code is pretty straightforward.
 +
 +
 +Compressing Music
 +-----------------
 +
 +Most music has the property that it doesn't jump around all over the place,
 +but rather notes progress in relatively small intervals (seconds, thirds,
 +fifths, etc.).  For example, a major chord (as used in a typical arpeggio) is
 +
 + note  note+4  note+7  note
 +
 +So the idea is to use a differential scheme, encoding the note _intervals_
 +instead of the note _values_.  That is, the above could instead be encoded as
 +
 + x  4  3  -7
 +
 +where each value is added to the current note value.  Thus, if intervals are
 +restricted to -8..7 then only four bits are required, cutting the data size
 +in half.  Well, not quite of course, because an escape code is needed to
 +specify notes outside of the -8..7 range, not to mention durations and fx
 +settings.  Luckily, not all interval values are used, so these can be used
 +for escape codes; code 96 -- the rest or "hold" note -- can also use an
 +otherwise unused value.
 +
 +So the decompression code looks something like
 +
 + read next four bits
 + if escape code then read next byte
 + otherwise add to current note value
 +
 +Pretty simple, and leads to quite substantial savings.  Ah, but for a little
 +more time...
 +
 +Another possibility arises if not all note values are used.  I set up the
 +frequency tables to contain all 12 notes in eight octaves.  If not all twelve
 +notes are actually used -- if none of the tunes contain a D# or whatever --
 +then a few bytes can be shaved off of the frequency table.  But this means
 +a lot of renumbering and such... it all depends on just how much you want
 +those extra bytes!
 +
 +And that's about all there is to it.
 +
 +
 +Source code
 +-----------
 +
 +*      
 +* Tiny music player, for the
 +* minigame contest.
 +*
 +* slj 9/23/01
 +* sjudd@ffd2.com
 +*
 +
 +                          ;         org $2800-491
 +         org $2800-425
 +
 +ARPDEL   = 2
 +SR       = $c9
 +GATEDEL  = 2
 +
 +curtune  = $fe
 +curdur   = $fc
 +
 +tunepos  = $b0            ;position in note sequence
 +seqpos   = $b2            ;position in seq list
 +dur      = $b4            ;2 bytes each
 +notefx   = $b6
 +curnote  = $b8
 +newnote  = $ba
 +
 +arpdur   = $bc
 +arpoff   = $bd
 +
 +noteptr  = $fa
 +temp     = $fa
 +freqtab  = $c000
 +
 +
 +
 +
 +*
 +* Code begins here
 +*
 +start    
 +
 +*-------------------------------
 +         do 0
 +         jmp InitFreq
 +         jmp Play
 +         jmp Reset
 +         fin
 +*-------------------------------
 +
 +*
 +* InitFreq -- set up note table
 +*
 +InitFreq 
 +         ldy #192
 +:l2      ldx #24
 +:loop    lda freq-1,x
 +         sta freqtab-1,y
 +         dex
 +         dey
 +         lda freq-1,x
 +         sta freqtab-1,y
 +         lsr freq,x
 +         ror freq-1,x
 +         dey
 +         beq :xit
 +         dex
 +         bne :loop
 +         beq :l2
 +:xit     rts
 +
 +*
 +* Play routine
 +* tune in .A
 +*
 +
 +Reset      ; Force a tune reset
 +         sec
 +         ror curtune
 +                          ; ldx #255
 +                          ; stx curtune
 +Play     
 +         ldx #1
 +         cmp curtune
 +         beq PlayTune
 +         sta curtune
 +
 +         ldy #00
 +         sty tunepos
 +                          ;         sty tunepos+1
 +         sty dur
 +         sty dur+1
 +
 +         lda #15
 +         sta $d418
 +         lda #SR
 +         sta $d406
 +         sta $d406+7
 +
 +
 +*
 +* Main routine
 +*
 +* voice in .X (0, 1)
 +*
 +
 +SetTune  
 +         ldy curtune
 +         txa
 +         beq :v1
 +:v2      lda v2tunepos,y
 +         bne :sta
 +:v1      lda v1tunepos,y
 +:sta     sta tunepos,x
 +
 +
 +PlayTune 
 +
 +Get      
 +         lda #00
 +         sta newnote,x
 +         dec dur,x
 +         bpl DoFx
 +
 +:next    ldy tunepos,x
 +         inc tunepos,x
 +         lda tunes,y
 +         beq SetTune
 +         bpl :c1
 +         and #$7f         ;$80+dur
 +         sta curdur,x
 +         bne :next
 +:c1      cmp #96
 +         bcc :setfreq
 +         beq :hold
 +         sta notefx,x
 +         bne :next
 +:note                     ;sta curnote,x
 +                          ;         sta newnote,x
 +                          ;         jsr setfreq
 +:SetFreq 
 +         asl
 +         tay
 +         lda Freqtab,y
 +         pha
 +         lda Freqtab+1,y
 +         ldy SidOffset,x
 +         sta $d401,y
 +         pla
 +         sta $d400,y
 +:gate    
 +         lda #$21
 +         ldy SidOffset,x
 +         sta $d404,y
 +
 +
 +:hold    lda curdur,x
 +         sta dur,x
 +
 +
 +
 +DoFx     
 +         ldy notefx,x
 +         cpy #97
 +         bne AllDone
 +
 +         ldy dur,x
 +         cpy #GATEDEL
 +         bcs :c1
 +         lda #$20
 +         ldy SidOffset,x
 +         sta $d404,y
 +:c1      
 +
 +AllDone  
 +         dex
 +         bpl PlayTune
 +RTS      rts
 +
 +
 +*
 +* Frequencies
 +*
 +freq     
 +         da 34334
 +         da 36377
 +         da 38539
 +         da 40831
 +         da 43258
 +         da 45831
 +         da 48557
 +         da 51444
 +         da 54502
 +         da 57743
 +         da 61177
 +         da 64815
 +
 +                          ;arptab dfb 0,4,7
 +
 +*
 +* Tune positions (offset into sequence
 +* list)
 +*
 +v1tunepos
 +         dfb t0v1-tunes
 +         dfb t1v1-tunes
 +         dfb t2v1-tunes
 +v2tunepos
 +         dfb t0v2-tunes
 +         dfb t1v2-tunes
 +         dfb t2v2-tunes
 +
 +
 +tunes    
 +            ; dfb 00           ;dummy byte
 +*
 +* Sid offset table
 +*
 +SidOffset
 +         dfb 00
 +         dfb 07
 +
 +
 +t0v1     
 +         dfb 97
 +         dfb 10+$80-1     ;dur=10
 +         dfb 42,47,51
 +         dfb 42,47,51
 +         dfb 42,47,47,47
 +         dfb 51,51,51
 +         dfb 54,52,51
 +         dfb 52,51,49
 +         dfb 51,51,51
 +         dfb 49,49,49
 +         dfb 51,49,47
 +         dfb 49,47,46
 +         dfb 44,44,44
 +         dfb 46,46,46
 +         dfb 47,51,47
 +         dfb 46,42,46
 +         dfb 47,96,96
 +         dfb 96,96
 +         dfb 00
 +
 +t0v2                      ;v2
 +         dfb 97
 +         dfb 10+$80-1
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 35,39,96
 +         dfb 39,42,96
 +         dfb 42,30,96
 +         dfb 30,35,96
 +         dfb 35,34,96
 +         dfb 34,32,96
 +         dfb 32,32,96
 +         dfb 32,32,96
 +         dfb 32,34,96
 +         dfb 34,35,96
 +         dfb 35,30,96
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 00
 +
 +* Tune 2: Combat
 +
 +t1v1     
 +         dfb 100          ;No gate
 +         dfb 11+$80-1     ;dur=11
 +         dfb 51,54,58,59,96
 +         dfb 58,57,54,51,54,58,63,96
 +         dfb 62,61,60,59,56,53
 +         dfb 58,54,51,56,53
 +         dfb 46,50,53,56
 +         dfb 59,58,57,54
 +         dfb 0
 +
 +t1v2                      ;v2
 +         dfb 100
 +         dfb 11+$80-1
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 47,44,41,51,46,42,47,44
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 0
 +
 +* Tune 3: Castle
 +
 +t2v1     
 +         dfb 97           ;gate
 +         dfb 24+$80-1
 +         dfb 70,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 12+$80-1
 +         dfb 74,74,96,75
 +         dfb 24+$80-1
 +         dfb 77,75,74,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 24+$80-1
 +         dfb 70,96,96,96
 +         dfb 00
 +t2v2     
 +         dfb 97
 +         dfb 24+$80-1
 +         dfb 34,41,46,41
 +         dfb 34,41,46,41
 +         dfb 58,62,53,57
 +         dfb 58,53,46,96
 +         dfb 0
 +
 +end      
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 3===================================
 +
 +MagerTris -- by Per Olofsson <magervalp@cling.gu.se>
 +
 +Way back in 1990 or thereabouts I wrote a Tetris clone in basic on the
 +Amiga. When the compo was announced I scratched my head for a bit and
 +figured that I should have a good chance of fitting Tetris in 512
 +bytes. If I couldn't I would at least have a good engine for the 2K
 +compo.
 +
 +The Game
 +
 +This puzzle game was invented in the 80s by the Russian programmer
 +Alexey Pazhitnov. The game has a vertical field where one of seven
 +puzzle pieces appears at the top and falls towards the bottom. The
 +player can rotate the piece 90 degrees or move it left or right. When
 +it reaches the bottom the game field is checked for filled lines that
 +are removed. The player is awarded points for removing lines (with a
 +bonus for clearing more than one in one go) and the game is over when
 +the screen fills up so that new pieces can't enter the game field.
 +
 +Drawing the Screen
 +
 +The screen is filled with inverted space and the game pieces and the
 +border is drawn into the color ram. There aren't any real tricks used
 +here, the only one is that the color of the border is a side effect
 +of the last character printed, ascii 13 (light green).
 +
 +Random Numbers
 +
 +We need a decent source of random numbers to generate a random Tetris
 +piece. The SID chip's noise waveform is probably the best one
 +available on the C64, and fortunately the code to access it is really
 +short. To initialize I used:
 +
 +initrnd subroutine
 + lda #$81 ; enable noise
 + sta $d412 ; voice 3 control register
 + sta $d40f ; $81xx as the frequency
 +
 +and to get an 8-bit random number all you have to do is lda $d41b.
 +
 +The Tetris Pieces
 +
 +Tetris has 7 puzzle pieces that you have to store in a table. As every
 +piece consists of four boxes we need a total of 7*4 = 28 entries. The
 +smallest tetris table is probably the one where each piece is
 +represented by 8 bits, like this:
 +
 + 0010 0000 0011 0110 0100 0010 0110
 + 0111 1111 0110 0011 0111 1110 0110
 +
 +The table is very compact, a mere 7 bytes, but then you need code to
 +rotate every piece and accessing the table is somewhat clunky on the
 +6502. I've seen an x86 version that did this though, with a total size
 +of less then 256 bytes. However, on the 6502 I needed four bytes for
 +each piece for the code to be reasonably compact and I also kept all
 +the rotated pieces in the table, for a total of 112 bytes.
 +Fortunately, as every piece has a center box we only need to store 3
 +which trims the table to 84 bytes. But what do we actually store? As
 +the screen is painted to color ram I used a pointer to the current
 +position. In the table I simply stored the offset in color ram from
 +the center box. With indirect indexed addressing the routine is nice
 +and short.
 +
 +Failure
 +
 +The rest of the program is fairly straight forward. User input is just
 +a cmp loop, there are dec zp timers for most events, and a raster wait
 +to time everything. This is also where I failed to fit everything
 +inside the 512 byte limit, as everything put together took about 600
 +bytes, even after some serious optimizations. There are two very
 +similar routines: the one that draws a piece on the screen and the one
 +that checks for collisions. Both iterate through the four boxes in a
 +piece, but because of the way I used some zeropage pointers I couldn't
 +merge the two routines into one without rewriting most of the code.
 +
 +Success
 +
 +As rewriting everything meant too much work, I just decided to go for
 +plan B: write a 2K entry. With a 600 byte engine there was plenty of
 +space to add features, and at around 700 bytes or so compressíon
 +starts to make sense -- pucrunch breaks even around there somewhere,
 +and the final binary is actually 6500 bytes uncompressed. I could
 +easily fit a title screen complete with custom graphics, a nice tile
 +set for the tetris pieces, and even a hiscore list that saves to
 +disk. The only trick I used here was that the custom character set was
 +EOR:d with the ROM font, and as most characters are the same or very
 +similar it compresses much better that way.
 +
 +And that's the story of the 2K MagerTris.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 4===================================
 +
 +Tuff -- Compressing tiny programs
 +-----------------------------------> Stephen L. Judd <sjudd@ffd2.com>
 +
 +Most compression schemes address the problem of taking a large program or
 +data file and compressing it down.  But have you given much thought to the
 +problem of trying to compress a _small_ program, even a 512-byte program?
 +
 +I spent some time trying to come up with a compression scheme for tiny
 +programs -- saving even 2 or 3 bytes in tetrattack would have been helpful.
 +The effort was basically a failure, as the best I thought I could do was to
 +break even (on paper, for tetrattack).  I never actually implemented the code
 +either -- just worked it out on paper.
 +
 +It is fairly interesting though, and maybe these preliminary efforts will
 +give other people ideas (especially for compressing 2k programs).  And after
 +writing this article, I now think it might have really worked, and saved
 +a handful of bytes -- maybe next year?
 +
 +Tiny Compression Basics
 +-----------------------
 +
 +If you remember your C=Hacking #16, there are two basic approaches to
 +compression: taking a fixed-length input while giving a variable-length
 +output, or taking a variable-length input and producing a fixed-length output.
 +An example of the latter is LZ compression -- looking for repeated
 +strings of length 2, 3, etc. and replacing those with a single byte (or two
 +bytes, or whatever).  An example of the former is a Huffman tree, which
 +replaces each byte with a string of bits, using fewer bits for the most
 +common bytes.
 +
 +To decompress the compressed data, of course, a decompression routine is
 +needed.  For tiny compression, this routine obviously needs to be as tiny
 +as possible.  Most C64 decompression programs copy themselves somewhere
 +before decompressing the data stream to wherever it is supposed to go.
 +Obviously, if we put a few restrictions on the data to be decompressed
 +the decompression routine can be made smaller, and more specific to the
 +task.
 +
 +Finally, it is worth mentioning that a tiny program in general does not
 +use all possible byte values.
 +
 +Now, first consider Huffman encoding.  In a typical Huffman decoding scheme,
 +the decoder reads a bit at a time, traversing the Huffman tree with each
 +bit, until a byte is hit.  This implies a fairly complicated decoder, and
 +worse the tree must be stored.  Finally, the tree can be fairly large, if
 +the number of possible symbols is large.  So Huffman doesn't seem like a
 +good approach.
 +
 +So, consider LZ-style encoding: replacing n-byte sequences with a shorter
 +code.  In LZ, the n-byte code is replaced with a reference to the previously
 +decompressed bytes -- a code which says how far back to go, and how many
 +bytes to copy.  What a drag -- an escape code, the distance backwards, and
 +the number of bytes to copy.  For a byte-aligned decompressor, this implies
 +at least two bytes (since the decompression routine must be small, a byte-
 +aligned decompressor is of greater interest).
 +
 +So far so bad.  But, since a tiny program doesn't use all possible byte
 +values, perhaps a simple substitution method is possible -- that is, replacing
 +various 2-, 3-, 4- etc. byte sequences with one of those bytes.  Alas, the
 +numbers don't work out very well: in the above scheme, let's say an
 +n-byte sequence appears m times, for a total of n*m bytes.  We replace
 +each of those sequences with a single byte, giving a savings of
 +
 + n*m - m
 +
 +bytes.  But we also need to store the original sequence in a table somewhere,
 +plus the byte corresponding to that sequence.  This means another n+1 bytes,
 +so the total savings is
 +
 + n*m - (m+n+1)
 +
 +So, for example, if we have a 2-byte sequence repeated 3 times, the savings
 +is
 +
 + 2*3 - (2+3+1) = 0 bytes!
 +
 +The issue is that replacing a 2-byte sequence with a 1-byte sequence saves
 +one byte, and this must happen at least three times to overcome the three
 +bytes of table storage.  With 4 repetitions, one byte is saved.  And so
 +on.  The net result is that you have to have a _lot_ of sequences repeated
 +a lot of times to get any savings, and somehow this savings needs to be
 +greater than the decompression code.
 +
 +Note that, in general, if there are only a few sequences to replace then
 +some custom code can be shorter, but as the number of sequences increases
 +then generic code, with sequences stored in tables, quickly becomes
 +shorter.
 +
 +The final dagger in the heart of these schemes is the program itself.  I
 +wrote a simple program to find all the repeated strings in tetrattack.
 +There were numerous 2-byte sequences, but rarely repeated more than three
 +times.  There were a few 3-byte sequences, and even one 4-byte sequence.
 +But overall, there just aren't enough sequences, especially long sequences,
 +to make any sequence-shortening scheme worthwhile.
 +
 +There are, however, lots of repeated bytes.  For example, there are an
 +awful lot of STA instructions and zp variables, and usually a lot of
 +zeros.  This suggests taking another look at a Huffman-style scheme to
 +replace those bytes with a smaller number of bits.
 +
 +One major problem identified earlier is storing the Huffman tree -- it's
 +just too big.  So, the first thing to consider is to just compress the
 +most common bytes -- maybe the top five or eight bytes or whatever --
 +to make the tree smaller.  But that still leaves a substantial decompression
 +code to traverse the tree, and a tree with several empty nodes in it.
 +
 +Once again, if there are only a few nodes then custom code might be shorter,
 +but beyond a few nodes it's more practical to store the tree and have some
 +generic traversal code.
 +
 +But consider the following alternative to a Huffman tree: what if we had a
 +tree which looked like
 +
 +       0/\1
 +       /  \
 +    lit 0/\1
 + /  \
 +       b1  0/\1
 +   /  \
 + b2    ...
 +
 +where "lit" is a literal byte and b1, b2, etc. are encoded bytes.  This would
 +encode the most common byte with two bits (10), the next-most common byte
 +with three bits (110), and so on, and encode literal bytes with nine bits
 +(0byte).
 +
 +To see if this is a viable scheme, check out the numbers.  Let's say that
 +we encoded just the top three most common bytes in a program; in an early
 +version of tetratack those bytes were
 +
 + 00 $2F occurances
 + 03 $1A occurances (ZP variable)
 + 8D $18 occurances (STA opcode)
 +
 +In a real implementation, I'd choose zp variables to be common opcodes
 +like $8D, which would double the number of occurances of that byte, but
 +let's just consider the above numbers in a 512-byte program.  Those top
 +three bytes total $5A bytes, almost 1/8 of the total bytes (the top seven
 +bytes take a little less than 1/3 of the total, by contrast).  The number
 +of bits required is
 +
 + 9*($200-$5A) + 2*$2F + 3*$1A + 4*$18 = $FE2 bits = $1FD bytes.
 +
 +so, a whopping 3 byte savings -- just enough to store the three "tree"
 +values.  For the top seven bytes, a similar calculation suggests $01D8 bytes
 +of compressed data, again for code not optimized for compression.  For a
 +more optimized code, with zp variables the same as common opcodes and such,
 +the total bytes could conceivably be on the order of $1C0 bytes -- maybe
 +64 bytes of savings (but again, I never actually tried coding a more
 +optimized version so I don't know for sure).
 +
 +Of course, we haven't talked about decompression yet.  One important
 +feature of this tree is that the "treeness" of it is unimportant -- it's
 +really just a substitution code.  Remember that the data is encoded as
 +
 + 0+8 bits literal byte
 + 10 byte 1
 + 110 byte 2
 + 1110 byte 3
 +
 +and so on.  A decompression algorithm simply reads bits until a 0 is found,
 +and then looks up the relevant value in a table.  The exception is if
 +the first bit read is 0, in which case the code needs to read the next
 +eight bits and output the byte.  There is no tree traversal, in the sense
 +of looking up nodes and moving left/right, to speak of.
 +
 +Reading bits is trickier than reading bytes.  After every eight bits read
 +some sort of memory pointer has to be incremented to the next eight bits.
 +Moreover, when reading codes like 110 or 1110 the number of 1's read has
 +to be counted too, to be used as an index into the lookup table.  And if
 +there's a literal byte then the next eight bits need to be read, regardless
 +of their value.
 +
 +When reading the bitstream, after every eight bits the pointer into that
 +bitstream must be incremented.  To count eight "global" bits at a time, I use
 +a zp variable "count" and just rotate a bit through the different bit
 +positions.  Every time it turns zero, the data stream pointer is incremented.
 +The advantage of this method is that it frees up a register (and winds up
 +saving a byte overall).
 +
 +To count the number of "local" bits read, I use the .Y register.  Initially
 +it is set to zero, and increments as 1s are read.  For a literal byte,
 +I set .Y to -8, and put a little check in the bit reader for negative .Y;
 +that way, it simply increments to .Y=0.
 +
 +When bit=0 is finally found, the code simply does an lda data,y to get the
 +coded data.  Since .Y counts the number of bits read, .Y will be at least 1
 +for any coded data, but .Y=0 for a literal byte as described above.
 +Therefore, for a literal byte, the code can simply rotate bits into the
 +location "data", and use the same lda data,y instruction to fetch the
 +literal byte.
 +
 +With all that in mind, here's the code I finally came up with (on paper!):
 +
 +Tuff+1
 +[data lookup table]
 +
 +Source  [compressed data bitstream]
 +
 +:lit ldy #-8
 +:getbit lsr count ;Increment pointer every eight bits
 + bne :skip
 + ror count
 + inx
 + bne :skip
 + inc :src+2
 +:skip
 +:src asl Source,x ;Get next bit
 + bcs :next
 + tya ;.y=0 means first bit is 0
 + beq :lit ;so read next 8 bits
 + bpl :found ;If .y<0 then literal
 +:next rol Tuff ;Save bit
 + iny
 + bne :getbit ;until .y=0
 +:found
 + lda Tuff,y ;Fetch code/literal byte
 + ldy #00
 + sta (zp),y ;Output data
 + inc zp
 + bne :getbit
 + inc zp+1
 + bpl :getbit
 +[Uncompressed code -- (zp) points here!]
 +
 +for a total of 45 bytes, plus 7 bytes for the "Tuff" byte table -- total
 +of 52 bytes, compared with a possible 64 byte savings (maybe I should have
 +tried to implement this after all!).  Hopefully someone can improve on
 +this further.  (And "Tuff", btw, is a contraction of "TinyHuff").
 +
 +Now, there may be a few raised eyebrows here which I hope I'll lower.  This
 +code makes several assumptions.  It can either be entered in at ":src",
 +or at ":getbit" depending on the value of "count", the global bit counter.
 +I assume count is already initialized (e.g. default kernal/restart value),
 +either to 1 (in which case :getbit is the entry point) or $80, in which case
 +:src is the entry point, with the "Source" stream modified accordingly in
 +each case.
 +
 +I further assume that .axy have their default values of 0 when the routine
 +is called from BASIC.
 +
 +I also assume "zp" is initialized to the correct value already.  One easy
 +way to do this is to use $ae/$af, the end load address+1, but other
 +options are certainly available.
 +
 +Finally, you may have noticed that the thing just keeps outputting data
 +until zp=$8000 -- correct.  Unless you really really care about garbage in
 +memory above the program, there's no reason to waste bytes on an "end of data"
 +check.
 +
 +Another variant
 +---------------
 +
 +I spent a little time looking at a variant of this scheme: using a fixed
 +length bit sequence, for example, assigning four bits to each substitution
 +byte, i.e.
 +
 + 0+ literal byte
 + 1000 byte 1
 + 1001 byte 2
 + 1010 byte 3
 + ...
 + 1111 byte 8
 +
 +The numbers work out pretty well, compression-wise, but at the time I felt
 +the decompression code was more complicated and that it wouldn't break
 +even.  Looking at it now, though, I'm starting to think otherwise -- that
 +loading three bits at a time isn't any more complicated that fetching
 +eight bits so code can be reused, and there isn't any issue about checking
 +for a 1 bit, so the decompression code ought to be even simpler.  So this
 +may actually be the better scheme.
 +
 +Sigh...  I don't know about you, but every time I revisit a problem I wind
 +up feeling much stupider.
 +
 +Final thoughts
 +--------------
 +
 +So, that's it: assign shorter bit codes to the most common bytes, and use
 +a highly optimized decompression routine.  This method is very specialized,
 +obviously, but might, just might, make it possible to save a few bytes on
 +a tiny program that has been optimized for compression -- using zp locations
 +the same value as common opcodes, reusing vars as often as possible, and
 +so forth.
 +
 +The reason this works is that, in general, repeated long strings in a small
 +program just don't happen, whereas repeated bytes are common and make up
 +a substantial fraction of the total.
 +
 +Pelle (magervalp@cling.gu.se) has suggested the possibility using the first
 +eight bytes of the compressed program for the 'Tuffman' table -- to structure
 +the code to make these common bytes, and thereby save eight bytes by not
 +having to store the 'tree' Might save a few more bytes with some programs.
 +
 +Finally, while writing this article, I started to wonder about byte sequences
 +with "holes", for example, <byte skip byte skip> for things like STA zp1
 +STA zp2.  I wonder if there's some way of taking advantage of these repeated
 +"mask" patterns.
 +
 +But I leave that thought for another person, for another day.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 5===================================
 +
 +Tinyrinth
 +by Mark Seelye (aka Burning Horizon)
 +mseelye@yahoo.com
 +http://www,burninghorizon.com
 +
 +
 +Introduction
 +    When the mini-game competition was first announced I was originally not
 +interested because of time considerations -- I spend most of my time at work,
 +or watching the kids.  But once I saw that nearly everyone on EFNet #c-64 was
 +doing one, I thought it would be fun if I could just keep it simple.
 +    
 +    I had no idea what game to do.  I had a few other ideas but I settled on
 +a maze game because I thought it'd be an interesting challenge doing a maze
 +algorithm and have game code in under 512 bytes.  As it turned out, I ended up
 +getting the code down to like 475 bytes, but the version I turned in was 509
 +bytes.  (I ended up shaving an additional 34 bytes just before the due date.)
 +
 +    The other reason for the maze game is I have this weird thing I do where
 +in each language I know I code this little maze generation algorithm that I
 +made a long time ago.  I had used the algorithm in ASP, VB, C, and a few
 +others but for some reason I had never done it in 6502!  Imagine that!  It was
 +going to be a bit of a challenge because I had never done it in so few
 +instructions, and never in ASM.  It would have to lose some weight and fluff -
 +but I knew I could do it.  So obviously there are MANY better ways to generate
 +a maze, but in light of trying to make it small this routine ended up being
 +such a strange mutation of my idea that I'm surprised it worked at all.
 +
 +Setup
 +    Before I do anything I must setup some things.  This is also used when a
 +player dies OR enters a new level.  The only difference between a reset of the
 +game and moving to the next level is the ldx #$00: sta $53.  The $53 is also
 +used to set the color of the maze so it rotates from a level one of white, to
 +a level 16 of BLACK (which is evil because then you can't see the maze except
 +for the small hint of the corners on a turn.)
 +    
 +; Initialization
 +setup = *
 +gameover = *
 +        ;Setup game variables
 +        ldx #$00
 +        stx $53         ; Start at level 1 (to be inc'd)
 +init = *
 +        ;Setup level
 +        inc $53         ; next level
 +        ldx $53
 +        stx $54         ;Counter for drawing Keys (next level)
 +        stx $0286       ;Character Color to use on clear (e544)
 +
 +        ;Set Render Cursor Start Pos / Player Color
 +        lda #$05
 +        sta EBCM1       ;Set ebcm color PLAYER to GREEN ($d022)
 +        sta $45         ; Cursor/Player Position X (0-39)
 +        sta $46         ; Cursor/Player Position Y (0-24)
 +        ;Clear Screen
 +        jsr $E544       ;clear screen set to char color ($0286)
 +
 +        lda #$5b
 +        sta $d011      ;turn on EBCM
 +        lda #$18
 +        sta $d018      ;Activate Cset
 +
 +As you can see that I mention extended color background mode, I'll talk about
 +that later.  You also might notice I use a character set, but that I will talk
 +about a little later as well, first I want to jump right into the maze
 +generation.
 +    
 +
 +Maze Algorithm
 +    The easiest way to describe how the algorithm works is to compare it to a
 +worm.  The worm eats through an area until it gets blocked by one of its
 +former meals.  While eating the worm is counting how many meals he has, once
 +he gets to a certain number he knows he can eat no more.
 +    With that said, I'm sure you're completely confused and probably think
 +(know?) I'm crazy.  So now for the technical explanation.  Starting with a
 +2-dimensional grid of some size, initialized with 0's, set a start point for
 +the formation of the maze within the grid somewhere.  After that has been
 +taken care of the rest follows simple logic:
 +
 +        Have we reached every part of the grid?  
 +            if so: done; 
 +            if not: Can we grow the maze into an adjacent area from the
 +     current position?  
 +                if so: Grow in one of those directions at random, count it,
 +                       continue; 
 +                if not: Find a place in the maze that can grow, continue
 +
 +    I had to change the order of the logic around to get it to better fit into
 +a smaller number of bytes.  So I moved the "find a place to grow from"
 +routine, and the "Have we reached every part of the grid" check after the
 +check if we can grow in any direction from the current position.
 +    
 +    To check in all directions from the current position I used a subroutine.
 +The subroutine has 5 entry points which translate to: cangrow up/right/down/
 +left, and the 5th entry point is actually the first entry point - it checks
 +to see what is in .a and checks the direction that value represents.  To check
 +the direction I use a kernal routine to set some zp so I can easily check the
 +value on the screen to the left, right, below, or above the current position.
 +This is also where the boundaries are set: 40 columns wide and 25 characters
 +high.
 +    
 +popgridloop = *
 +        ldy $45         ;xpos
 +        ldx $46         ;ypos
 +cangrowxy = *           ;Can grow in any direction?
 +        jsr cgup        ;check up
 +        beq _cgxy       ;if 0 then we can grow
 +        inx             ;offset up check
 +        jsr cgright     ;check right
 +        beq _cgxy       ;if 0 then we can grow
 +        dey             ;offset right check
 +        jsr cgdown      ;check down
 +        beq _cgxy       ;if 0 then we can grow
 +        dex             ;offset down check
 +        jsr cgleft      ;check left
 +_cgxy   beq growloop    ;if 0 then we can grow
 +
 +....The subroutine!...  There is a weird "injection" in the bottom of this
 +routine to save a couple bytes.. it is explained in the comments to the code.
 +
 +;Check if a cell can grow a direction
 +;1-up 2-right 3-down 4-left
 +; (y xpos, x ypos, a=dir)  x/y switched for indirect lda($xx),y below
 +; return: a == 0 : true (can move)
 +;         a <> 0 : false (cannot move)
 +cangrow = *
 +    cmp #$01
 +    beq cgup
 +    cmp #$02
 +    beq cgright
 +    cmp #$03
 +    beq cgdown
 +    ;cmp #$04
 +    ;beq cgleft *** not needed falling through
 +cgleft = *
 +    dey         ;set xpos - 1
 +    cpy #$ff    ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgright = *
 +    iny        ;set xpos + 1
 +    cpy #$28   ;check xpos (<40)
 +    beq cgno
 +    bne cgloadcell
 +cgup = *
 +    dex        ;set ypos - 1
 +    cpx #$ff   ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgdown = *
 +    inx        ;set ypos + 1
 +    cpx #$19   ;check ypos (<25)
 +    beq cgno
 +    ;*** fallthrough, bne cgloadcell not needed
 +cgloadcell = *
 +    lda #$1f
 +loadcell = *        ;x = ypos, y = xpos, a = and value
 +    sta $50
 +    jsr $e9f0   ; * sets $d1 $d2
 +    lda ($d1),    ;load byte  (x pos in y reg!)
 +cgno = *
 +    and $50  ;#$1f = use only low 5 bits!
 +    ;rts see below!
 +;  This is mixed with the rts because the first byte would
 +;  be wasted
 +growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto  48 is used for both
 +;  again first byte would be wasted so we overlap with the previous
 +growinto = *  
 +    .byte 2, 4, 8, 1, 2
 +
 +;explanation of above
 +; 0  1  2  4  8
 +;        4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +*From Mapping: ;59888         $E9F0
 +               ;Set Pointer to Screen Address of Start of Line
 +               ;This subroutine puts the address of the first byte of the
 +        ;screen line designated by the .X register into locations
 +        ;209-210 ($D1-$D2).
 +
 +    Well, if the code at "cangrowxy" discovers that it cannot grow in any
 +direction from the current position, it has to find a place to grow from.  To
 +do this it falls into the "findgrow" routine; the findgrow routine is
 +interesting because it is a "stateful" routine, meaning that upon reentry it
 +resumes what it was doing before.  The reason for this is I didn't want it
 +restarting at the top of the maze each time it entered this find routine - I
 +wanted it to continue to search from where it left off.  This routine works by
 +"walking" the current x and y positions through the grid.  To walk it through
 +it keeps track of where it left off and it just keeps setting the x and y
 +pointers to the next place and "returns" to the start of the generation
 +portion of the code.
 +    The other advantage to doing it this way is I can reset this routine to
 +place a "key" at a dead end in the maze.  This reset is done elsewhere in the
 +maze by setting a zp value to 0. When the reset is done the findgrow routine
 +runs a small piece of code to reset itself, but before it does this it places
 +a key if it has not exceeded the max number of keys.
 +
 +    Here is the find grow routine, each line of code is documented to hint at
 +what it is doing:
 +   
 +; *** fall into findgrow
 +findgrow = *
 +        lda $4b         ; Check byte 0 != resume findgrow
 +        bne _fgresume
 +        sta $49         ;Reset Findgrow Xpos
 +        sta $4a         ;Reset Findgrow Ypos
 +        inc $4b         ;Set findgrow flag to resume (<>0)
 +
 +        ;Place keys in corners (injected here for ease of placement, d1/d2 is
 +        ;pointed at a dead end)
 +        iny             ; offset left check
 +        beq _fgresume   ;Do not try when column is 0, it freaks out
 +        lda $54
 +        beq _fgresume   ;if 0 then keys are done
 +        dec $54         ;dec # of keys left to place
 +        inc $51         ;actual num keys left
 +        lda ($d1),    ;load byte
 +        ora #$c0        ;EBCM value for key!
 +        sta ($d1),    ;store new value
 +        ;(end of injection)
 +
 +_fgresume = *
 +_fgx    ldx $4a         ;Findgrow ypos
 +_fgy    ldy $49         ;Findgrow xpos
 +        inc $49         ;Next xpos (next round)
 +        cpy #$28        ; < 40
 +        beq _fgcr       ; next line if >= 40
 +        jsr cgloadcell  ; load cell byte
 +        beq _fgy        ; if 0 then get next xpos/byte
 +        sty $45         ;Set Current xpos
 +        stx $46         ;Set Current ypos
 +        jmp cangrowxy   ;Check if this can grow
 +_fgcr   lda #$00        ;Reset Findgrow xpos
 +        sta $49         ; 0->xpos
 +        inc $4a         ;Next Findgrow ypos
 +        lda $4a         
 +        cmp #$19        ;check ypos (<25)
 +        bne _fgx        ;If we're at x40y25 we are ready to play!
 +        beq gameinit    ;Start game logic
 +    
 +    That code also checks to see if we have completed generating the maze, at
 +which point it enters the game logic.  But before we jump into that I have to
 +explain what happens when the "cangrowxy" routine discovers that it CAN GROW,
 +instead of falling into the "findgrow" routine.
 +    
 +    In the "cangrowxy" routine described a bit above there was the final line
 +of:
 +
 +_cgxy beq growloop ;if 0 then we can grow
 +
 +    This "growloop" routine is the place where the actual "growing" or
 +"eating" occurs.  This routine works by choosing a direction at random, and
 +attempting to grow in that direction. It reuses the same "cgup" etc. routines
 +that the "cangrowxy" uses because although we know we can grow in a direction,
 +we don't know which direction.  In a larger version of this algorithm I
 +usually link all of this stuff together, but this is how it worked out in
 +tinyrinth.
 +    
 +growloop = *        
 +randdir = *
 +        ;jsr getrand; not a func, not reused yet
 +getrand = *
 +        lda #$80
 +        sta $028a ; Key Repeat; (injected here for #$80) just using #$80 for
 +   ;smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +        sta $d40f       ;set v3 random # gen bits
 +        lda $d41b       ; read random number from V3
 +        and #$03        ; Force Random number to 0-3
 +        clc
 +        adc #$01        ; Add 1 to get 1-4
 +        sta $4c         ; store rand direction
 +        ldy $45         ; Current Xpos
 +        ldx $46         ; Current Ypos
 +        jsr cangrow     ; Check if we can grow in that direction
 +        bne randdir     ; if <> 0 then Try again
 +        sta $4b         ; reset findgrow flag (injected here for .a==0)
 +grow = *
 +        ldx $4c         ;Get saved rand direction
 +        lda growinto, ; 1-4 (4, 8, 1, 2) Get bit set for new cell
 +        sta ($d1),    ; write new location
 +        lda growfrom, ; 1-4 (1, 2, 4, 8) Get bit to set for old
 +        sta $4d         ; Save growfrom bit
 +        ldy $45         ; Reload Current xpos
 +        ldx $46         ; Reload Current ypos
 +        jsr cgloadcell  ; Load base cell again
 +        ora $4d         ; Combine with growfrom bit
 +        sta ($d1),    ;Modify old cell to connect to new loc
 + ;Change current position
 +        lda $4c         ; Get saved rand direction
 +        jsr cangrow     ; Get new x y again - (this will only perform next x/y
 + ;adj, returns <>0)
 +        sty $45         ; xpos   set to new location
 +        stx $46         ; ypos
 +        jmp popgridloop ; Return to populate grid loop
 +    
 +    As you can see above I use the random number generator from V3 to get the
 +random direction.  A HUGE bonus here is I can use the value returned to
 +directly call that main entry point in "cangrow" so it will figure out which
 +direction to check by itself.  If the direction it checks is bad it just loops
 +and tries again.  In a larger version of the program I'd make this much more
 +efficient, but efficiency costs bytes.  Many coders would be afraid of using a
 +loop like this but since I pre-checked that I can grow before entering this
 +loop, unless my random number generator never returns that direction, I should
 +be fine!
 +    Once we have a good direction I use the already set $d1 zp to read and
 +update the screen data.  Nice bonus of using the routine that sets it!  The
 +code that updates the data must update the cell you are growing to, but ALSO
 +it must update the cell you are growing from, which is why I have to reload
 +the original data up there.  When it is finished doing the growing it moves
 +the x/y positions ($45, $46) to the place it grew to; and resumes the maze
 +generation from there.
 +    
 +    As I said before, the findgrow routine will jump into the game logic once
 +it knows it is finished, but before we talk about that I should describe
 +another problem that I needed to solve.
 +
 +    In a 2 dimensional maze there are 16 possible pieces that can be formed.
 +Starting with completely closed ending with completely open.  I represent
 +these pieces with numbers 0-15; each bit represents a side of a cell that is
 +open/closed.  So when I finish with the maze generation I have a grid of
 +numbers 0-15. (Actually 1-14; in this game I never end up with 0's or 15's
 +as all pieces are used.)  In Tinyrinth the "grid" I used was the screen memory
 +(40x25), so I ended up with a screen full of characters A-N.
 +    Navigating a maze of characters A-N would be rather difficult so I had to
 +decide whether to use the built in character set and transform what was on the
 +screen or somehow use a character set.  Well, another hitch was I wanted this
 +to be a game and losing the simplicity of 0-15 for detecting which way a
 +player could move would be bad - so I opted for a character set of 16
 +characters.  Hmmm 8 bytes per character time 16 characters.. 128 bytes!
 +Yikes!
 +
 +Character Set Generation
 +    So I was going to need a 128 byte character set, but I didn't want to have
 +to lose 128 bytes.  So what did I do?  I coded about 70 bytes of code to
 +generate the character set.  This too pained me because I was sure I could
 +make it smaller somehow, but once I got it working and a decent percentage
 +smaller than the size of what it produced I stopped worrying about it.
 +    Here is how it works: we know we want to represent the 16 pieces with 0-15
 +so the bits must give some hint to the shape of the piece.  What I did was
 +create a loop to check the current counter value and either set or skip the
 +top, sides, bottom of the piece.
 +        Here is how it works, I have commented each line of code to describe
 +what it is doing.
 +
 +        ; Generate Cset!
 +        lda #$20       ; write hi
 +        sta $48        ; use zp
 +        lda #$00       ; write lo
 +     
 +        ;Initialize Screen, variables (injected here to save bytes - using
 + ;lda #$00)
 +        sta $51           ; Clear actual num keys placed counter (see findgrow)
 +        sta $d020
 +        sta EBCM0;  Set BG Color ($d021)
 +        ;(end of injection)
 +
 +        tax            ; counter = 0
 +_again  sta $47        ; use zp
 +        ldy #$00       ; index
 +        txa            ; counter to a
 +        and #$01       ; check for top
 +        beq _ytop      ; yes top
 +        eor #%01111111 ; 00000001 -> 011111110 -> 10000001
 +_ytop   eor #%11111111 ; 00000000 -> 111111111    
 +_6sides sta ($47),   ; store top/sides to cset
 +        iny            ; next mem location
 +        txa            ; counter to a
 +        and #$02       ; check for right
 +        eor #%00000010 ; flip
 +        lsr            ; 00000010 -> 00000001 || 0->0
 +        sta $49        ; store for right side 
 +        txa            ; counter to a
 +        and #$08       ; check for left side
 +        bne _noleft    ; no left
 +        eor #%10001000 ; 00000000 -> 10001000 -> 10000000
 +_noleft eor #%00001000 ; 00001000 -> 00000000
 +        ora $49        ; merge with right
 +        cpy #$07       ; 7->15->23->...
 +        bne _6sides    ; total of 6 side pieces
 +        txa            ; counter to a
 +        and #$04       ; check for bottom
 +        beq _ybot      ; no bottom
 +        eor #%01111010 ; 00000100 -> 01111110 -> 10000001
 +_ybot   eor #%11111111 ; 00000000 -> 11111111    
 +        sta ($47),   ; store bottom to cset
 +        inx            ; next counter
 +        clc            ; clear carry
 +        lda $47        ; inc zp
 +        adc #$08       ; by 8
 +        bne _again     ; do it again
 +        inc $48
 +        ldy $48
 +        cpy #$28       ;repeat through cset 2000-2800
 +        bne _again
 +        sty EBCM2      ;Set Death color ($d023) (using result of cset gen for
 +        ;color value!)
 +
 +    I still think I could have made that smaller; either way though it was fun
 +making a little routine to generate the cest.  With that all said, we now know
 +how the maze is generated.  But now I needed a way to play it, as suggesting
 +players use dry erase markers on their screens was not a great idea.
 +
 +
 +Game Loop
 +    Ok, once "findgrow" reaches the end of its maximum size it branches to
 +"gameinit" Game init sets up some quick things and gets going on the game
 +play.  The game loop loops until the player either gets all the keys or dies.
 +When the player gets all the keys the level number ($53) is increased which is
 +used to increase the maximum number of keys and the color of the maze!
 +    Here is the game loop, I have commented much of the code and separated it
 +into sections:
 +
 +*** Flashes the keys, sets speed of "minitaur" based on the level
 +
 +; Game Initialization and Game Loop
 +gameinit = *
 +gameloop=*
 +        inc EBCM3       ; Flash Keys ($d024)
 +        inc $58         ; Increase Speed counter #1 (0-255)
 +        bne moveplayer  ; Skip move
 +        inc $57         ; Increase Speed counter #2 ($57|#$f0 - 255)
 +        bne moveplayer  ; Skip Move
 +        
 +        ;set death speed
 +        lda $53         ;Use level for Speed value
 +        cmp #%11111000  ;If more than this use default speed
 +        bmi _dsp        
 +        lda #%11111000  ;Default speed
 +_dsp    ora #%11110000  ;Set high nibble so counter counts up to 255
 +        sta $57         ; Set Speed counter #2
 +
 +*** Moves the minitaur to the next position if it is time. (Checks for player
 +*** hits)
 +
 +;move death
 +movedeath = *
 +        ldy $55         ;Baddy Xpos
 +        ldx $56         ;Baddy Ypos        
 +        jsr cgloadcell  ; load the cell/point the zps (ANDs by #$1f)
 +        sta ($d1),    ;store cleared value
 +_newy   iny             ;increase xpos
 +        cpy #$28        ;less than 40?
 +        bmi _go         ;don't reset
 +        ldx $46         ;ypos of player
 +        stx $56         ;ypos of death
 +        ldy #$00        ;clear xpos counter
 +_go     sty $55         ;Set baddy xpos
 +        lda #$ff        ;Get all bits! (see loadcell)
 +        jsr loadcell    ;load the cell/point the zps
 +        sta $59         ;Save cell value (withh all possible bits)
 +        and #%11000000  ;and by EBCM bits
 +        cmp #%11000000  ;Check for KEY - (so it can skip over)
 +        beq _newy       ;Jump ahead 1 more to skip key position
 +        cmp #%01000000  ;Check for player hit
 +        bne _nodie      ;Player is not dead
 +        jmp gameover    ;Game Over!  
 +_nodie  lda $59         ;Reload stored value
 +        ora #$80        ;EBCM for Death
 +        sta ($d1),    ;store value
 +; *** fall through to Move Player
 +
 +*** Checks the keyboard for input, moves player if possible
 +
 +;Move Player
 +moveplayer=*
 +_ffe4   jsr $ffe4       ;Get keypress
 +        beq gameloop    ;no key - goto gameloop
 +        and #%00000011  ;.a == 0-7 at this point
 +        tax             
 +        lda keytodir, ;Loads from keytodir
 +;Move entity in game
 +; .a=direction 1-up 2-right 3-down 4-left
 +glmove  tax
 +        stx $4e         ; store direction
 +        lda growinto, ; get check bit
 +        sta $4f         ; store check bit
 +        ldy $45         ; current xpos
 +        ldx $46         ; current ypos
 +        jsr cgloadcell  ; load the cell (and with #$1f)
 +        sta ($d1),    ; store the data (clear the EBCM)
 +        lda #$00        ; Bottom "fall out" fix
 +        sta $50         ; clear and of cangrow  Bottom "fall out" fix
 +        lda $4e         ; load direction
 +        jsr cangrow     ; call cangrow to move xpos/ypos
 +        and $4f         ; check bit
 +        bne glmyes         ; if we have a bit then we can move!
 +        ldy $45         ; reload xpos - do not move
 +        ldx $46         ; reload ypos - do not move
 +glmyes  lda #$ff        ; bits to obtain from loadcell
 +        jsr loadcell    ; load the cell/point the zps
 +        pha             ; temp store value for later checks
 +        and #$1f        ; clear other EBCM bits 
 +        ora #$40        ; EBCM ORA Player/Baddy
 +        sta ($d1),    ; store new data
 +        sty $45         ; store xpos of new position
 +        stx $46         ; store ypos of new position
 +
 +*** Checks for key hits, minitaur hits, Check for end-level: 
 +    (jumps to next level if it is the end of the level, loops to the beginning
 +     of the game loop if not.)
 +
 +        ;Hit checks
 +        pla             ; load previous value
 +        and #$c0        ; check for hits "11xxxxxx"
 +        cmp #$c0        ; check for key hit
 +        bne _back       ;_notkey     ; to next check
 +        dec $51         ; dec number of keys left in level
 +        bne _back       ; if 0 then we should go to the next level 
 +        jmp init        ; gen maze again
 +;_notkey cmp #$80        ; check for death hit!
 +;        bne _back 
 +;        jmp gameover       ; game over
 +_back   jmp gameloop
 +        ;more checks here?
 +
 +keytodir=*
 +.byte 2,1,4,3
 +
 +
 +    To make this all work, I reused the "cangrow" routine again!  Because
 +basically it is the exact same logic.  One thing that I found unfortunate is I
 +never coded a way for the minitaurs to have any intelligence, there was an
 +attempt that worked - but it was FAR too many bytes (over by like 20-30).  So
 +I just axed it.  I also at one time made the movement routines all one
 +subroutine that took parameters that would move any "entity" This also took
 +too many bytes to be useful for this version of the game.  Perhaps some day
 +I'll write a kick butt 1k or 2k version with all those features coded in.
 +
 +Various Space Saving Techniques Used
 +
 +    If you look throughout the code, you'll notice I comment a lot on
 +"injections" These are little bits of code that are conveniently put in
 +places to take advantage of the state of a register, or a memory location.
 +    Here is an example: in the "getrand" routine I injected a line to set the
 +key repeat, using the value that I wanted to also put into $d412 for the
 +random number generation.
 +
 +        lda #$80
 +        sta $028a; Ket Repeat; (injected here for #$80) just using #$80 for smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +
 +
 +Another example, here I mixed some static data together to save some bytes and
 +just labeled it so some would be reused.  To save additional bytes I also
 +positioned the label in front of the "rts" (This was because the code
 +accessing the data never used an index of 0.)  This is mixed with the rts
 +because the first byte would be wasted
 +
 + growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto  48 is used for both
 +;  again first byte would be wasted so we over lap with the previous
 +growinto = *  
 +    .byte 2, 4, 8, 1, 2
 +
 +;explanation of above
 +; 0  1  2  4  8
 +;        4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +    I also reused the "cangrow" routine as much as possible, and on top of
 +that I made it more usable by making multiple entry points so it could be used
 +in different way.
 +    
 +    Finally the best thing I did was use kernal routines, like $E544 to clear
 +the screen to the set char color at $0286.  This allowed me to change the
 +color of the maze each level just by changing the value at $0286.
 +    I also used $e9f0, Mapping the c64 says, "Set Pointer to Screen Address of
 +Start of Line. This subroutine puts the address of the first byte of the
 +screen line designated by the .X register into locations 209-210 ($D1-$D2)."
 +This was handy because I could use this to read from and write to the screen.
 +
 +Extended Color Background Mode
 +    I could not use sprites.  This made me sad, but there was just no room for
 +it.  So I just decided to allow the cset to repeat itself through the whole
 +cset area ($2000-$2800) and use ECBM!  This allowed me to just set a bit and
 +make a maze piece a different color, what more when I read this value I could
 +easily tell what the piece represented!  That made collision detection very
 +easy, and also allowed me to have cool flashing keys.
 +
 +Conclusion
 +    I had a LOT of fun coding this and look forward to another competition.  I
 +was surprised and amazed by the entries!  This even inspired me to make a
 +better version of Tinyrinth, of which I have not completed.  I did add to it,
 +I made a version in which the minitaurs move through the maze.  I made another
 +version that would also increase the number of minitaurs in the maze.  I made
 +yet another version that would change the size of the maze, and thought about
 +changing the shape too.  All of these ideas were easy to implement once I had
 +the base game working.
 +    Oh by the way, I know how to spell Minotaur, the Minitaur thing is just a
 +bad joke.  :)
 +
 +
 +Final Code Stats
 +    Total source compiled: 445 lines in 1 file(s)
 +    Symbol table:  29 global and 1 local constants, 0 global variables,
 +    2 global and 15 local labels, 0 source labels. 
 +    Data size: 10 bytes
 +    Code size: 475 bytes in 231 instructions
 +    Memory block affected: $1000 - $11E4 (total size: 485 bytes)
 +
 +Source
 +    Note: This source is not the source I released for the event.  It is a bit
 +smaller, I cannot locate my original source for some reason.  Also, some of it
 +may differ a tiny bit from the code documented above. I had written the
 +article using the wrong copy of the source, I have attempted to go back and
 +update it though.  (Note to self: keep better records!)  :)
 +
 +    Another note:  This source compiles with c64asm by Balint Toth.  :)    
 +
 +-- tinyrinth.asm  --
 +; Tinyrinth
 +;   entry for a 512b game contest
 +; Burning Horizon/FTA
 +; Mark Seelye
 +; mseelye@yahoo.com
 +; ===================================================================
 +
 +* = $1000
 +
 +;name   loc     desc       color  ecbm bits
 +EBCM0 = $d021 ; untouched  black   $00  00
 +EBCM1 = $d022 ; cursor     red     $40  01
 +EBCM2 = $d023 ; touched    green   $80  10
 +EBCM3 = $d024 ; keys       yellow  $c0  11
 +
 +;ZPs used:  (Consolidation Possible if needed)
 +;43/44 - Not Used
 +;45/46 - Current X/Y Position, Maze Generation & Game
 +;47/48 - CSet Location, CSet Generation
 +;49    - Temp Storage, CSet Generation
 +;49/4a - Xpos/Ypos Findgrow, Maze Generation
 +;4b    - Flag, Findgrow, Maze Generation
 +;4c    - Temp Storage, randdir, Maze Generation
 +;4d    - Temp Storage, grow, Maze Generation
 +;4e/4f - Temp Storage, glmove, Game
 +;50    - Temp Storage, loadcell, Maze Generation & Game
 +;51    - NumKeys Left in level (affected by: destroyed & found keys)
 +;52    - not used
 +;53    - Current Level 
 +;54    - # of Keys to try and place, gameinit, Game
 +;55/56 - X/Y Pos of Death
 +;57/58 - speed counter for death
 +
 +;Collect all keys
 +; gen crummb path for visited areas?
 +; once all keys collected next maze is gen'd
 +
 +; Initialization
 +setup = *
 +gameover = *
 +        ;Setup game variables
 +        ldx #$00
 +        stx $53         ; Start at level 1 (to be inc'd)
 +init = *
 +        ;Setup level
 +        inc $53         ; next level
 +        ldx $53
 +        stx $54         ;Counter for drawing Keys (next level)
 +        stx $0286       ;Character Color to use on clear (e544)
 +
 +        ;Set Render Cursor Start Pos / Player Color
 +        lda #$05
 +        sta EBCM1       ;Set ebcm color PLAYER to GREEN ($d022)
 +        sta $45         ; Cursor/Player Position X (0-39)
 +        sta $46         ; Cursor/Player Position Y (0-24)
 +        ;Clear Screen
 +        jsr $E544       ;clear screen set to char color ($0286)
 +
 +        lda #$5b
 +        sta $d011      ;turn on EBCM
 +        lda #$18
 +        sta $d018      ;Activate Cset
 +
 +        ; Generate Cset!
 +        lda #$20       ; write hi
 +        sta $48        ; use zp
 +        lda #$00       ; write lo
 +     
 +        ;Initialize Screen, variables (injected here to save bytes - using lda #$00)
 +        sta $51           ; Clear actual num keys placed counter (see findgrow)
 +        sta $d020
 +        sta EBCM0;  Set BG Color ($d021)
 +        ;(end of injection)
 +
 +        tax            ; counter = 0
 +_again  sta $47        ; use zp
 +        ldy #$00       ; index
 +        txa            ; counter to a
 +        and #$01       ; check for top
 +        beq _ytop      ; yes top
 +        eor #%01111111 ; 00000001 -> 011111110 -> 10000001
 +_ytop   eor #%11111111 ; 00000000 -> 111111111    
 +_6sides sta ($47),   ; store top/sides to cset
 +        iny            ; next mem location
 +        txa            ; counter to a
 +        and #$02       ; check for right
 +        eor #%00000010 ; flip
 +        lsr            ; 00000010 -> 00000001 || 0->0
 +        sta $49        ; store for right side 
 +        txa            ; counter to a
 +        and #$08       ; check for left side
 +        bne _noleft    ; no left
 +        eor #%10001000 ; 00000000 -> 10001000 -> 10000000
 +_noleft eor #%00001000 ; 00001000 -> 00000000
 +        ora $49        ; merge with right
 +        cpy #$07       ; 7->15->23->...
 +        bne _6sides    ; total of 6 side pieces
 +        txa            ; counter to a
 +        and #$04       ; check for bottom
 +        beq _ybot      ; no bottom
 +        eor #%01111010 ; 00000100 -> 01111110 -> 10000001
 +_ybot   eor #%11111111 ; 00000000 -> 11111111    
 +        sta ($47),   ; store bottom to cset
 +        inx            ; next counter
 +        clc            ; clear carry
 +        lda $47        ; inc zp
 +        adc #$08       ; by 8
 +        bne _again     ; do it again
 +        inc $48
 +        ldy $48
 +        cpy #$28       ;repeat through cset 2000-2800
 +        bne _again
 +
 +        sty EBCM2      ;Set Death color ($d023) (using result of cset gen for color value!)
 +        
 +        
 +popgridloop = *
 +        ;can grow from current?
 +        ldy $45         ;xpos
 +        ldx $46         ;ypos
 +        ;Can grow in any direction?
 +cangrowxy = *
 +        jsr cgup        ;check up
 +        beq _cgxy       ;if 0 then we can grow
 +        inx             ;offset up check
 +        jsr cgright     ;check right
 +        beq _cgxy       ;if 0 then we can grow
 +        dey             ;offset right check
 +        jsr cgdown      ;check down
 +        beq _cgxy       ;if 0 then we can grow
 +        dex             ;offset down check
 +        jsr cgleft      ;check left
 +_cgxy   beq growloop    ;if 0 then we can grow
 +; *** fall into findgrow
 +findgrow = *
 +        lda $4b         ; Check byte 0 != resume findgrow
 +        bne _fgresume
 +        sta $49         ;Reset Findgrow Xpos
 +        sta $4a         ;Reset Findgrow Ypos
 +        inc $4b         ;Set findgrow flag to resume (<>0)
 +
 +        ;Place keys in corners (injected here for ease of placement, d1/d2 is pointed at a dead end)
 +        iny             ; offset left check
 +        beq _fgresume   ;Do not try when column is 0, it freaks out
 +        lda $54
 +        beq _fgresume   ;if 0 then keys are done
 +        dec $54         ;dec # of keys left to place
 +        inc $51         ;actual num keys left
 +        lda ($d1),    ;load byte
 +        ora #$c0        ;EBCM value for key!
 +        sta ($d1),    ;store new value
 +        ;(end of injection)
 +
 +_fgresume = *
 +_fgx    ldx $4a         ;Findgrow ypos
 +_fgy    ldy $49         ;Findgrow xpos
 +        inc $49         ;Next xpos (next round)
 +        cpy #$28        ; < 40
 +        beq _fgcr       ; next line if >= 40
 +        jsr cgloadcell  ; load cell byte
 +        beq _fgy        ; if 0 then get next xpos/byte
 +        sty $45         ;Set Current xpos
 +        stx $46         ;Set Current ypos
 +        jmp cangrowxy   ;Check if this can grow
 +_fgcr   lda #$00        ;Reset Findgrow xpos
 +        sta $49         ; 0->xpos
 +        inc $4a         ;Next Findgrow ypos
 +        lda $4a         
 +        cmp #$19        ;check ypos (<25)
 +        bne _fgx        ;If we're at x40y25 we are ready to play!
 +        beq gameinit    ;Start game logic
 +        
 +              
 +growloop = *        
 +randdir = *
 +        ;jsr getrand; not a func, not reused yet
 +getrand = *
 +        lda #$80
 +       sta $028a; Ket Repeat; (injected here for #$80) just using #$80 for smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +        sta $d40f       ;set v3 random # gen bits
 +        lda $d41b       ; read random number from V3
 +        and #$03        ; Force Random number to 0-3
 +        clc
 +        adc #$01        ; Add 1 to get 1-4
 +        sta $4c         ; store rand direction
 +        ldy $45         ; Current Xpos
 +        ldx $46         ; Current Ypos
 +        jsr cangrow     ; Check if we can grow in that direction
 +        bne randdir     ; if <> 0 then Try again
 +       sta $4b          ; reset findgrow flag (injected here for .a==0)
 +grow = *
 +        ldx $4c         ;Get saved rand direction
 +        lda growinto, ; 1-4 (4, 8, 1, 2) Get bit set for new cell
 +        sta ($d1),    ; write new location
 +        lda growfrom, ; 1-4 (1, 2, 4, 8) Get bit to set for old
 +        sta $4d         ; Save growfrom bit
 +        ldy $45         ; Reload Current xpos
 +        ldx $46         ; Reload Current ypos
 +        jsr cgloadcell  ; Load base cell again
 +        ora $4d         ; Combine with growfrom bit
 +        sta ($d1),    ;Modify old cell to connect to new loc
 +        ;Change current position
 +        lda $4c         ; Get saved rand direction
 +        jsr cangrow     ; Get new x y again - (this will only perform next x/y adj, returns <>0)
 +        sty $45         ; xpos   set to new location
 +        stx $46         ; ypos
 +        jmp popgridloop ; Return to populate grid loop
 +
 +
 +; Game Initialization and Game Loop
 +gameinit = *
 +gameloop=*
 +        inc EBCM3       ; Flash Keys ($d024)
 +        inc $58         ; Increase Speed counter #1 (0-255)
 +        bne moveplayer  ; Skip move
 +        inc $57         ; Increase Speed counter #2 ($57|#$f0 - 255)
 +        bne moveplayer  ; Skip Move
 +        
 +        ;set death speed
 +        lda $53         ;Use level for Speed value
 +        cmp #%11111000  ;If more than this use default speed
 +        bmi _dsp        
 +        lda #%11111000  ;Default speed
 +_dsp    ora #%11110000  ;Set high nibble so counter counts up to 255
 +        sta $57         ; Set Speed counter #2
 +
 +
 +;move death
 +movedeath = *
 +        ldy $55         ;Baddy Xpos
 +        ldx $56         ;Baddy Ypos        
 +        jsr cgloadcell  ; load the cell/point the zps (ANDs by #$1f)
 +        sta ($d1),    ;store cleared value
 +_newy   iny             ;increase xpos
 +        cpy #$28        ;less than 40?
 +        bmi _go         ;don't reset
 +        ldx $46         ;ypos of player
 +        stx $56         ;ypos of death
 +        ldy #$00        ;clear xpos counter
 +_go     sty $55         ;Set baddy xpos
 +        lda #$ff        ;Get all bits! (see loadcell)
 +        jsr loadcell    ;load the cell/point the zps
 +        sta $59         ;Save cell value (withh all possible bits)
 +        and #%11000000  ;and by EBCM bits
 +        cmp #%11000000  ;Check for KEY - (so it can skip over)
 +        beq _newy       ;Jump ahead 1 more to skip key position
 +        cmp #%01000000  ;Check for player hit
 +        bne _nodie      ;Player is not dead
 +        jmp gameover    ;Game Over!  
 +_nodie  lda $59         ;Reload stored value
 +        ora #$80        ;EBCM for Death
 +        sta ($d1),    ;store value
 +; *** fall through to Move Player
 +
 +;Move Player
 +moveplayer=*
 +_ffe4   jsr $ffe4       ;Get keypress
 +        beq gameloop    ;no key - goto gameloop
 +        and #%00000011  ;.a == 0-7 at this point
 +        tax             
 +        lda keytodir, ;Loads from keytodir
 +;Move entity in game
 +; .a=direction 1-up 2-right 3-down 4-left
 +glmove  tax
 +        stx $4e         ; store direction
 +        lda growinto, ; get check bit
 +        sta $4f         ; store check bit
 +        ldy $45         ; current xpos
 +        ldx $46         ; current ypos
 +        jsr cgloadcell  ; load the cell (and with #$1f)
 +        sta ($d1),    ; store the data (clear the EBCM)
 +        lda #$00        ; Bottom "fall out" fix
 +        sta $50         ; clear and of cangrow  Bottom "fall out" fix
 +        lda $4e         ; load direction
 +        jsr cangrow     ; call cangrow to move xpos/ypos
 +        and $4f         ; check bit
 +        bne glmyes         ; if we have a bit then we can move!
 +        ldy $45         ; reload xpos - do not move
 +        ldx $46         ; reload ypos - do not move
 +glmyes  lda #$ff        ; bits to obtain from loadcell
 +        jsr loadcell    ; load the cell/point the zps
 +        pha             ; temp store value for later checks
 +        and #$1f        ; clear other EBCM bits 
 +        ora #$40        ; EBCM ORA Player/Baddy
 +        sta ($d1),    ; store new data
 +        sty $45         ; store xpos of new position
 +        stx $46         ; store ypos of new position
 +
 +        ;Hit checks
 +        pla             ; load previous value
 +        and #$c0        ; check for hits "11xxxxxx"
 +        cmp #$c0        ; check for key hit
 +        bne _back       ;_notkey     ; to next check
 +        dec $51         ; dec number of keys left in level
 +        bne _back       ; if 0 then we should go to the next level 
 +        jmp init        ; gen maze again
 +;_notkey cmp #$80        ; check for death hit!
 +;        bne _back 
 +;        jmp gameover       ; game over
 +_back   jmp gameloop
 +        ;more checks here?
 +
 +keytodir=*
 +.byte 2,1,4,3
 +
 +
 +;Check if a cell can grow a direction
 +;1-up 2-right 3-down 4-left
 +; (y xpos, x ypos, a=dir)  x/y switched for indirect lda($xx),y below
 +; return: a == 0 : true (can move)
 +;         a <> 0 : false (can not move)
 +cangrow = *
 +    cmp #$01
 +    beq cgup
 +    cmp #$02
 +    beq cgright
 +    cmp #$03
 +    beq cgdown
 +    ;cmp #$04
 +    ;beq cgleft *** not needed falling through
 +cgleft = *
 +    dey         ;set xpos - 1
 +    cpy #$ff    ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgright = *
 +    iny        ;set xpos + 1
 +    cpy #$28   ;check xpos (<40)
 +    beq cgno
 +    bne cgloadcell
 +cgup = *
 +    dex        ;set ypos - 1
 +    cpx #$ff   ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgdown = *
 +    inx        ;set ypos + 1
 +    cpx #$19   ;check ypos (<25)
 +    beq cgno
 +    ;*** fallthrough, bne cgloadcell not needed
 +cgloadcell = *
 +    lda #$1f
 +loadcell = *        ;x = ypos, y = xpos, a = and value
 +    sta $50
 +    jsr $e9f0   ; sets $d1 $d2
 +;59888         $E9F0
 +;Set Pointer to Screen Address of Start of Line
 +;This subroutine puts the address of the first byte of the screen line
 +;designated by the .X register into locations 209-210 ($D1-$D2).
 +    lda ($d1),    ;load byte  (x pos in y reg!)
 +cgno = *
 +    and $50  ;#$1f = use only low 5 bits!
 +    ;rts see below!
 +;  This is mixed with the rts because the first byte would
 +;  be wasted
 +growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto  48 is used for both
 +;  again first byte would be wasted so we over lap with the previous
 +growinto = *  
 +    .byte 2, 4, 8, 1, 2
 +
 +;explanation of above
 +; 0  1  2  4  8
 +;        4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +
 +
 +;Notes
 +
 +; 1 
 +;8 2
 +; 4 
 +;
 +;
 +;*** * * *** * *
 +;*0* *1* *2  *3    A  B  C
 +;*** *** *** ***
 +;
 +;*** * * *** * *
 +;*4* *5* *6  *7    E  F  G
 +;* * * * * * * *
 +;
 +;*** * * *** * * 
 +; 8*  9*  a      I  J  K
 +;*** *** *** *** 
 +;
 +;*** * * *** * * 
 +; c*  d*  e      M  N  O
 +;* * * * * * * * 
 +
 +-- end: tinyrinth.asm  --
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 6===================================
 +
 +
 +    Tetrattack!
 +
 +         by Stephen L. Judd <sjudd@ffd2.com>
 +
 +Introduction
 +------------
 +
 +The main point of Tetrattack!, of course, was to try and write a 3D engine
 +in 512 bytes.  I had a few other routine ideas but couldn't think of anything
 +remotely fun to do with them, but I finally thought of a neat 3D game idea
 +and decided to go for it.  Alas, as code kept getting bigger that idea had
 +to be discarded, and the eventual result was tetrattack, the simplest/smallest
 +3D game that wasn't (totally) stupid.
 +
 +But it is a 3D engine in 512 bytes!
 +
 +At first blush a 512-byte 3D engine might seem ridiculous but think about it:
 +a basic 3d program doesn't need to store any graphics definitions or things
 +like that -- it draws its own as it goes.  In fact, all that is needed is a
 +line routine, a rotation/projection routine, and some routines to keep track
 +of angles and positions and such.  What could possibly be hard about that,
 +right?
 +
 +Well, it seemed reasonable at the time, anyways.  At exactly 512 bytes, the
 +code really had to sweat out the bytes, and every single byte saved was
 +important; meanwhile, the routines still had to be reasonably fast.  So this
 +article will discuss the line routine, the rotation/projection routine, and
 +the object and game manipulation routines, and the various optimizations used.
 +
 +I must say that writing this program really was fun; I'm sure that everyone
 +else had the same problems I had, trying to shift from a "cycle-optimization"
 +mindset to a "byte-optimization" mindset.  Finding those extra few bytes
 +here and there was a mighty challenge, sometimes taking hours or even days
 +just to save one or two bytes.  That part was frustrating, of course, as
 +is the fact that the 'game' aspect of the program is thoroughly lame.  But
 +it was a neat challenge, and there was never an issue of adding too many
 +features to the program and never finishing it!
 +
 +
 +Quick Summary
 +-------------
 +
 +A full-on 3D program uses matrix multiplications to compute rotations, but
 +3D calculations get a whole lot easier in a Battlezone or driving-type
 +situation, when you only turn left or right.  There are no longer rotation
 +matrices to keep track of -- just a single angle.  This means that rotations
 +are done with a simple INC/DEC, and there are no matrix multiplies to deal
 +with.  Rotations are done with some simple table lookups.
 +
 +The line drawing routine is pretty standard.  Self-modifying code is
 +used to manipulate a single routine to draw different lines, by swapping
 +INX/INY instructions (which turns out to be less efficient, memory-wise --
 +sigh, oh well).  Lines are plotted to a double-buffered 128x128 charmap,
 +and once again setting up the screen took extra bytes.
 +
 +The "game code" is the simplest possible -- tetrahedrons are moved by
 +just incrementing their z-coordinate, and rotated by incrementing their
 +angle.  Their locations are determined by whatever zero page is initialized
 +to -- the program never sets explicit locations.  Instead, I just chose a
 +portion of zero page that had a reasonable range of default values (default
 +zero-page values are used extensively in the code)
 +
 +In addition to default zero-page values the code uses several kernal
 +routines (including a few unusual ones, for example the shift-C= routine
 +for swapping charset buffers), self-modifying code, lots of zero-page
 +instructions, and a whole bunch of tightly optimized code.
 +
 +The code was written over a few weeks -- a good week of planning and writing
 +out possible routines, and a good week or two of coding.
 +
 +A couple of times I froze the program (using AR) and the freeze almost always
 +happened in the same place: the buffer clearing routines.  So the rotation
 +and line drawing routines turned out to be fast enough, even with all the
 +byte savings.
 +
 +The code was developed using the Sirius assembler on a 128D+SCPU -- and btw,
 +the Jammon debugging mode has turned out to be awfully cool, with being able
 +to visually single-step through code while simultaneously viewing memory and
 +registers.  Action Replay was downright clunky afterwards (but could freeze!).
 +
 +Yep, just a plug, with a bit of amazement that it actually all worked.
 +
 +I also tested it with VICE, to make sure it worked OK.  VICE initializes
 +zero page to different values than my 128, unfortunately, so some tweaking
 +had to be done.
 +
 +And that's about it.  Now on to more detail.
 +
 +
 +----------
 +Setting up
 +----------
 +
 +The setup routines initialize VIC, initialize tables used by the line and
 +rotation routines, and set up the screen.  I note that when I started 
 +planning this program I computed the sizes of line and rotation routines
 +without considering the code needed to set up tables and such for those
 +routines.  Oops!
 +
 +The default initial values for .X, .Y, and .A are 0 after a SYS, so the
 +code made a little use of this.
 +
 +Line routine setup
 +------------------
 +
 +The line routine uses tables to look up pixel addresses and bit locations
 +for each x,y coordinate (the line routine stores the x and y coordinates
 +directly in the .X and .Y registers, so looking up pixel addresses amounts
 +to LDA Bitp,X kinds of instructions).  The "bitmap" is a 128x128 charmap,
 +so the calculations are pretty easy.  The bit patterns which correspond to
 +different x-coordinates (10000000 01000000 etc.) are computed using
 +
 + cmp #$80
 + rol
 +
 +to cyclically rotate a single bit.
 +
 +Rotation routine setup
 +----------------------
 +
 +Another little piece of code is piggybacked into this loop, which extends a
 +sine table from 0..pi.  In this program, "angles" can go from 0..63,
 +corresponding to 0..2*pi; the most memory-efficient way to store the sine
 +table is to store the values for 0..pi/2, and then extend the table to
 +pi/2..pi.  That is, angle=16 (angles go from 0..63, remember) corresponds to
 +pi/2, so the job of extending the table amounts to copying table values 15..0
 +to values 17..32 (value 16 is not copied because it is pi/2 -- the values
 +are 'mirrored' through pi/2.  To put it another way: where would it be copied
 +to?).
 +
 +This amounts to starting one index register at 15 and decrementing it, while
 +starting the other register at 17 and incrementing:
 +
 + ldy #15
 + ldx #17
 + lda table,y
 + sta table,x
 + inx
 + dey
 +
 +The problem here is piggybacking into a loop where .Y is initialized to
 +ldy #$7F instead of ldy #15.  The solution is to initialize .X carefully
 +
 + [ldy #$7f]
 + ldx #$90
 +
 +and use the following:
 +
 +        lda sin0,y
 +        sta sin0+17,x
 + inx
 + [dey]
 +
 +The idea is that when .Y=15, .X will equal 0 and start incrementing.  This
 +copies a lot of junk into the table that is never used but doesn't overwrite
 +the existing 0..16 table values, and saves several bytes.
 +
 +The rotation tables are tables of r*sin(theta), one table for every allowed
 +value of theta=0..64 and r=-128..127.  The idea is that every 256-byte page
 +contains r*sin(theta) for one value of theta, so that a table lookup amounts
 +to
 +
 + lda theta
 + ora #>high byte of sin tables
 + sta zp+1
 + ldy r
 + lda (zp),y
 +
 +The table values are computed using a standard shift-and-add multiply routine,
 +with a little extra code to handle negative values of r.  What about
 +negative values of sin(theta), though?
 +
 +Recall that theta=0..63 corresponds to 0..2*pi, but the init code only
 +extends the sine table from 0..pi (i.e. theta=0..32).  The table values
 +for pi..2*pi are computed simply by taking the negative of the values
 +for 0..pi:
 +
 +         sta (p4000),y
 +         eor #$ff
 +         clc
 +         adc #1
 +         STA (p6000),y
 +
 +(The clc adc #1 code was painful, byte-wise, but necessary).  The pointers
 +p4000 and p6000 are initialized to $4000 and $6000 by default -- or at
 +least are supposed to be.  Unfortunately, VICE and my 128D had different
 +ideas about this, so a little extra code was necessary to initialize them.
 +
 +A big reason for putting a table at $6000 is that it ends at $8000, to
 +provide an easy exit condition for the loop:
 +
 +         inc p6000+1
 +         bpl :mult8       ;to $8000
 +
 +Screen setup
 +------------
 +
 +A couple of things need to be done here: clear the screen, set the color RAM
 +to the background color, and put a 16x16 block of characters in the middle
 +of the screen (and do so in as few bytes as possible!).
 +
 +Clearing the screen is a piece of cake:
 +
 +         sta $d021
 +         sta $0286        ;clear color
 +         jsr $e544        ;clr scr
 +
 +The $0286 STA is to clear the color RAM to black, and work with different
 +kernals.  (With both color RAM and background the same, pixels won't
 +appear where they aren't wanted).
 +
 +Storing a block of chars isn't quite so straightforward; rather, the
 +straightforward methods use up a fair number of bytes, especially since both
 +screen and color RAM have to be set up.  Here's the method I finally
 +came up with:
 +
 +:l2a
 +         jsr $e8ea        ;scroll up
 +                          ;.A = ($AC)
 +                          ;.X=00
 +         clc              ;necessary?
 +:l2                       ;set one line
 +         sta $0720+12,x
 +         inc $db20+12,x
 +         inx
 +         adc #16
 +         bcc :l2
 +                          ;         adc #$00
 +         inc $ac
 +
 +         cmp #15
 +         bne :l2a
 +
 +Kernal routine $e8ea is the routine to scroll the entire screen up one line,
 +and disassembly shows that when it exits .A is equal to the contents of $AC
 +and .X is 0.  So the routine stores one line of chars to the middle of the
 +screen, increments the corresponding color RAM (to white), and moves the
 +screen up (chars and color RAM).  By incrementing $AC before moving the
 +screen up ($AC starts at zero normally), .A is re-initialized to its previous
 +value+1, so this prints the next line of chars to the screen (that is, the
 +previous line of chars + 1).  By doing this 16 times a 16x16 block of chars
 +is produced, with color RAM set to white inside the block, and set to black
 +everywhere else.
 +
 +And that brings us to the main program loop.
 +
 +
 +---------
 +Main Loop
 +---------
 +
 +The main loop is pretty straightforward: clear buffer, get input, update
 +positions and angles, render the buffer, and swap buffers.
 +
 +The only thing special in this process is swapping buffers via the kernal
 +routine at $eb59 -- this is the routine that is called when C= and shift
 +are held down.
 +
 +There is also a lame dot that pulses to the screen when you hold the fire
 +button down (how cool some laser lines would have been).  Alas...
 +
 +But that's pretty much it.  So now let's talk a little about some of the
 +optimizations that can be made for a 3D program, followed by a discussion of
 +the line routine and the rotation routine.
 +
 +--------
 +3D Stuff
 +--------
 +
 +For detailed information on how to "do" 3D worlds I suggest reading issue #16
 +of C=Hacking, but the basics of rendering a 3D world are:
 +
 +1) figuring out where an object is, relative to you,
 +2) if the object is visible, computing what it looks like from your
 +   perspective, and 
 +3) rendering the object.
 +
 +To define and operate on an "object", we need to know the vertices of the
 +object, and how to connect them together with lines, i.e. to draw the edges.
 +The simplest object of all is a tetrahedron -- four points, and every point
 +is connected to every other point.  This turns out to be pretty important,
 +as a lot of memory is saved by using only tetrahedrons.  To see why, first
 +consider what the code to draw edges from a list of vertices must look like:
 +
 + rotate points
 +
 + ldx #number_of_edges
 +:loop stx temp
 + ldy edge_vertex1,x ;index into some list of vertices
 + lda edge_vertex2,x
 + tax
 + jsr drawline
 + ldx temp
 + dex
 + bpl :loop
 +
 +Now consider a cube; in my original code idea, I was going to use cubes
 +and tetrahedrons (pyramids).  A cube has eight vertices and 12 edges.
 +Each vertex has an x, y, and z coordinate, and from the code above each edge
 +needs two bytes, one for each vertex of the edge.  48 bytes, just to store a
 +cube -- almost 1/10 of the entire 512-byte code!  (Using code to generate the
 +vertices is possible, but still takes a fair number of bytes.)
 +
 +Compare this to the humble tetrahedron, which only requires twelve bytes:
 +three for each vertex.  What about the edges?  Remember that every point
 +is connected to every other point in a tetrahedron.  This means that we do
 +not have to store the edge connections, but can compute them manually, and
 +draw the whole thing using the following code:
 +
 + [count = 3]
 +                          ; Plot
 +:ploop   ldy count
 +:l2      ldx count
 +         dey
 +         tya
 +         pha
 +         jsr DrawLine
 +         pla
 +         tay
 +         bne :l2
 +         dec count
 +         bne :ploop
 +
 +This code simply draws lines between all vertices: 3-2, 3-1, 3-0, and so on.
 +It uses pretty much the same number of bytes as the earlier code, but
 +doesn't require any storage of the edge connections (12 bytes to store
 +the connections).
 +
 +Next consider the vertices of a tetrahedron.  It turns out that by carefully
 +selecting the vertices, default zero-page values can be used, something
 +that isn't possible with e.g. a cube.  Consider the following tetrahedron
 +vertices:
 +
 +Px = 0, -4, 16, -4
 +Py = 16, 0,  8,  0
 +Pz = 0, 16,  0, -16
 +
 +(just the vertices of a more or less regular tetrahedron which I drew on
 +a piece of paper).  The trick is to arrange these coordinates in the right
 +way.
 +
 +Location $ae/$af is used by the load routine (among others) to store to
 +memory while loading; at the end of a load, this location contains the
 +end address+1.  And, it is surrounded by zeros:
 +
 +>C:00ab  00 00 00 00  00 00 00 3c  03 00 00 00  00 00 00 00   .......<........
 +
 +which means that by engineering the end address right we can store two
 +coordinates here.  In this case, I chose the end address+1 to be $10f0, which
 +stores $f0 and $10 in $ae/$af -- -16 and 16, two of the Pz values, the other
 +two being zero.
 +
 +This was my original plan anyways, until I re-discovered that the screen
 +setup routine altered the value of $ac (the kernal routine to move the screen
 +up exits with .A=$ac, so incrementing it re-initializes .A in every loop
 +iteration.)  On the other hand, it increments it 16 times, so that after the
 +load and the init the zp values are
 +
 +>C:00ab 00 10 00 f0  10 ..
 +
 +The values 00 10 00 f0 are exactly the Pz values above: 0, 16, 0, -16.
 +Careful study of default zero-page values then shows
 +
 +>C:0040  00 00 08 00  00 00 00 24  00 00 00 00  00 00 00 00   .......$........
 +
 +This is pretty much Py, except for the first value -- so all the code has
 +to do is store a 16 at location $40, and it turns out the screen init loop
 +ends with .X=$10.  So, one STX does the job, saving yet more bytes.
 +Moreover, by locating the vertices in zero page, zero-page instructions
 +can be used to access the point lists, saving even more bytes.
 +
 +Now, the above are the vertices, but what about the _centers_, i.e. the
 +object locations?  Once again, to save space I just used some default
 +zero-page values, with the x-coordinates at $14 and the z-coordinates at $7d.
 +The reason these were chosen is pretty straightforwards.  Since objects
 +just move 'downstream' with increasing z-coordinate, the z-coords should
 +be somewhat spread out; the values at $7d are:
 +
 +>C:007d  3a b0 0a c9  20 f0 ef 38  e9 30 38 e9  d0 60 80 4f
 +
 +which are somewhat spread-out.  The x-coords, on the other hand, need to
 +be clustered around 0.  The camera (you) is located at x=0, and if objects
 +have a large value for the center x-coordinate they are way off to the side
 +(you've probably noticed a few objects like this).  Location $14 has some
 +zeros and some nonzero numbers that are near zero, i.e. that cluster around
 +zero:
 +
 +>C:0014  00 00 19 16  00 0a 76 a3  00 00 00 00  00 00 76 a3
 +
 +Not great values, but good enough -- beggars can't be choosers.
 +
 +Of course, poking different values into locations 20 and above will put the
 +tetrahedrons in different places.
 +
 +
 +-----------------------
 +Rotation and Projection
 +-----------------------
 +
 +Rotations are done very simply, using tables.  As outlined earlier, there
 +is a complete set of tables of r*sin(theta).  At first blush, the 'natural'
 +way of setting up these tables would be to have a table of 1*sin(theta) in
 +one page, a table of 2*sin(theta) in the next page, and so on, maybe fitting
 +a table of cos(theta) in the same page.  The location of r*cos(theta) would
 +then be given by
 +
 + page = offset+256*r (e.g. $40+r for tables starting at $4000)
 + value = page+theta
 +
 +It makes a lot more sense, though, to instead let _theta_ be the page index,
 +and let each page contain r*sin(theta) for different values of _r_; the lookup
 +is then
 +
 + page = offset+256*theta (e.g. $5300 for theta=$13)
 + value = page+r
 +
 +For one thing, the pointer setup is very easy -- there's only one value
 +of theta to set up, instead of different values of r (for different x,y
 +coordinates).  For another, the entire range r=-128..127 can be used,
 +which means the same tables can be used to rotate object _centers_ in
 +addition to object _vertices_.
 +
 +(If you recall programs like lib3d, the object centers are 16-bit signed
 +values and have their own rotation routine, whereas the object vertices
 +are limited to -96..95 and have a separate, optimized rotation routine; in
 +this case, both the vertices and centers are eight-bit signed numbers.)
 +
 +This all means that rotations (e.g. y*cos(theta) - x*sin(theta)) amount to
 +
 + LDY Py,X
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 +
 +to rotate a point.
 +
 +Now, the usual 3D world calculations look like:
 +
 + rotate object centers
 + rotate vertices, add to rotated center, and project
 +
 +Consider rotating the z-coordinate; the code will look like
 +
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 + CLC
 +RotM1 ADC CZ
 +
 +that is: rotate vertex (z*cos(theta) - x*sin(theta)) and add to center
 +(ADC CZ).  Here's the great part: if we instead want to rotate the _center_
 +of the object, we want to _store_ the rotated point in CZ; the code would
 +look like
 +
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 + STA CZ
 +
 +This is exactly the same code as above, except the ADC is changed to a STA.
 +Therefore only one routine is needed, using self-modifying code to decide
 +whether to STA the center coordinate or ADC it.  In the code, the appropriate
 +instruction is passed into the routine in the .Y register.
 +
 +There's one other thing to be mentioned about the rotation routine.  The
 +rotation tables are of r*sin(theta), and r*cos(theta) is of course done
 +by adding pi/2 to the theta-coordinate (theta+16 in the program).  There
 +are two ways to do this -- add 16 to the theta coordinate and correct
 +any overflow (AND #63, for example), or extend the tables by an extra pi/2.
 +
 +Byte-wise, the first method is the best, but check this out: instead of
 +adding pi/2, we can also subtract 3*pi/2.  The setup code then becomes:
 +
 +RotProj 
 +         AND #$3F
 +         ORA #>SINTAB
 +         STA sin+1
 +         SBC #$2F         ;-3*pi/4
 +                          ;         ADC #$10
 +                          ;         AND #$3F
 +         ORA #>SINTAB
 +         STA cos+1
 +
 +The trick works here because of the ORA #>SINTAB, which is $4000 -- this
 +allows negative angles to work, by borrowing higher bits.  Consider:
 +
 +.A ORA #>SINTAB SBC #$2F ORA #>SINTAB
 +-- ------------ -------- ------------
 +$12 $52 - sin tab at $5200 $22 $62 - cos tab at $6200
 +$22 $62 - sin tab at $6200 $32 $72 - cos tab at $7200
 +$32 $72 - sin tab at $7200 $42 $42 - cos tab at $4200
 +
 +(C is clear in the SBC above).  By using an SBC, angles automatically
 +wrap around from $8000 to $4000.  This saves two bytes over the ADC #$10
 +AND #$3F method (commented out above).  Every byte counts!!!
 +
 +Projections
 +-----------
 +
 +The easiest way to do projections, once again, is via tables of f(x,z)=x/z,
 +much like the tables of f(r,theta) = r*sin(theta).  Alas, the setup code
 +to generate these tables took up too much space, so the projects are
 +computed 'manually', using a shift-and-add division routine.
 +
 +Now, it turns out that the routine really computes 64*x/z -- the 64 is the
 +magnification factor.  The nice thing about a shift-and-add routine is that
 +multiplying by 64 is possible just by changing the number of shifts.  So
 +the routine adds an extra 6 shifts into the division loop -- a little
 +inefficient cycle-wise, but pretty thrifty byte-wise.
 +
 +
 +-------------
 +Drawing lines
 +-------------
 +
 +Obviously, to render an object we need a line drawing routine.  The line
 +routine in the program is pretty standard, and described in several earlier
 +issues of C=Hacking, so I won't go into too much detail.  Most of the
 +paragraphs below are just random tidbits, so feel free to skip any (there
 +won't be a test).
 +
 +The simplest and most efficient routines draw to a charmap -- you can read
 +about this in detail in C=Hacking #8 and beyond -- since the .Y register maps
 +directly to the y-coordinate, and the column-offsets are easy to compute.
 +The biggest charmap available is 128x128, so that's what I used.
 +
 +Overall, the line routine is pretty small, while being reasonably fast.
 +The plot routine is 'dumb', and recalculated for every pixel, but the
 +recalculation itself is fairly efficient, with a few table lookups.  It
 +draws into a 128x128 buffer and can handle off-screen coordinates, and
 +the buffer can be anywhere (that it, it supports a double-buffer display).
 +So, I think it turned out OK, all things considered.
 +
 +With that background, here's the main line drawing loop:
 +
 +plot
 +         pha
 +         tya
 +         bmi calcline
 +         lda bitphi,x
 +         ora base
 +         sta point+1
 +         lda bitplo,x
 +         sta point
 +         lda bitp,x
 +         beq calcline
 +         ora (point),y
 +         sta (point),y
 +calcline
 +         pla
 +mod2     sbc #dx
 +         bcs mod5
 +mod3     adc #dy
 +mod4     inx
 +mod5     iny
 +         dec temp
 +         bne plot
 +
 +
 +The basic line routine iteration goes like
 +
 + plot x,y
 + y=y+1
 + a = a - dx
 + if a<0 then a=a+dy, x=x+1
 +
 +for slope>1.  The code for slope<1 is exactly the same, except for
 +swapping x and y, and dx and dy.  Therefore, I just used one routine and
 +self-modifying code to set INX and INY etc. in the right places (mod2, mod3,
 +mod4, and mod5 above).  In retrospect this was not smart, and I think it
 +would have saved 5 or 6 bytes to use two routines, and jsr to the plot
 +routine -- the routines to set up the self-modifying lines take more space
 +than the basic iteration.
 +
 +The x and y coordinates are stored directly in the .x and .y registers.
 +This makes the plotting really easy and relatively fast, and works well
 +with the 128x128 charmap.
 +
 +Off-screen points are a must, so the allowed coordinates are -64..192,
 +giving 64 pixels to either side of the 128x128 visible area.  Checking for
 +these is easy, since the high bit will be set for coords <0 and >127.
 +These are checked in the plot routine.  The y-coordinate is checked directly,
 +using tya; the x-coordinate is checked for range by use of the bitp table --
 +this table contains the bit pattern for the x-coordinate (%1000000 %01000000
 +etc.), and all of the values for x>127 are set to zero.  Once again, in
 +retrospect it would have been more memory-efficient to do this with txa (two
 +more bytes), but darn it all, SOMETHING has to be at least a little bit cycle
 +efficient here!  Old habits die hard!
 +
 +
 +The line routine always draws from left to right, which means the
 +initialization routine calculates dx = x2-x1, and if dx<0 then it swaps the
 +two endpoints.  This creates a problem when coordinates are -64..192 though;
 +after subtracting x2 and x1 do you check the negative bit, or the carry bit?
 +Cases like x1=-1, x2=1 should work, so the carry bit is useless; on the other
 +hand, long lines, say x1=1, x2=130, should also work, so the high bit is also
 +useless.  The solution I used was to offset coordinates by +64 (and once
 +again, in retrospect...).
 +
 +The point coordinates are stored in two lists, one of x-coords and one of
 +y-coords.  This means that the endpoints are simply indexes into those
 +point lists, i.e.:
 +
 + xlist contains list of x-coords; ylist list of y-coords
 + xlist,x and ylist,x contain left endpoint
 + xlist,y and ylist,y contain right endpoint
 +
 +That simplified a lot of things -- the lists can be in zero-page, swapping
 +coordinates is very simple, and so on.
 +
 +The routine is double-buffered -- flickery single-buffers are just too
 +painful.  Since $2000 and $2800 are the locations of the two charset buffers,
 +the current buffer is stored in zero-page variable $ce ("base" in the
 +plot code above), which starts out initialized to #$20 (location $81 would
 +also work) -- one more variable to not initialize.
 +
 +So, in summary: a pretty standard line routine, with a few memory optimizations
 +thrown in.
 +
 +
 +-------
 +Wrap-up
 +-------
 +
 +Well, that's about it.  As you can see, the code really runs on the edge --
 +zero-page has to be initialized correctly, little bits and pieces are missing
 +from certain routines, things have to line up just right... but it works!
 +There are undoubtably other places where bytes can be saved, of course,
 +but I think it does a pretty good job of shaving bytes overall.
 +
 +See you at next year's contest!
 +
 +*      
 +* tetrattack!
 +*
 +* Mini-3D rendering package
 +*
 +* - Motion in the x-z plane => rotations about one axis only
 +* - Plots to charmap
 +*
 +* SLJ 9/01
 +*
 +
 +                          ;         org $1000
 +         org $0ef0
 +
 +NUMOBJS  = 9
 +
 +temp     = $02
 +dx       = $03
 +dy       = $04
 +point    = $05
 +
 +AUX      = $03
 +ACC      = $04
 +
 +CX       = $07
 +CZ       = $08
 +rottemp  = $09            ;0 initially
 +
 +zpoint   = $0a            ;low byte already set to zero
 +zpoint2  = $0c
 +
 +count    = $0e
 +angle    = $0f
 +
 +cenx     = $14
 +
 +Py       = $40
 +
 +XCoord   = $50
 +YCoord   = $60
 +
 +theta    = $70
 +
 +cenz     = $7d
 +
 +                          ; $ac taken by screen setup routine
 +                          ; final value = $10
 +
 +Pz       = $ab
 +
 +
 +base     = $ce            ;$20 initially
 +                          ;alternate: ($81)
 +
 +p4000    = $8f            ;$4000 initially
 +p6000    = $89            ;$60xx initially
 +p8000    = $9c            ;$8000
 +
 +tagged   = $e0
 +
 +sin      = $f7            ;0 low byte
 +cos      = $f9
 +
 +curobj   = $ff
 +
 +
 +bitp     = $1c00
 +bitplo   = $1d00
 +bitphi   = $1e00
 +bitphi2  = $1f00
 +SINTAB   = $4000
 +PROJTAB  = $8000
 +
 +
 +DEY      = $88
 +INY      = $c8
 +INX      = $e8
 +ADC      = $65
 +STA      = $85
 +
 +start    
 +         sta $8f          ;set up p4000
 +
 +*
 +* Set up VIC
 +*
 +         sta $d021
 +         sta $0286        ;clear color
 +         jsr $e544        ;clr scr
 +         lda #$18
 +         sta $d018
 +
 +
 +*
 +* Line drawing tables
 +*
 +
 +         ldx #$90
 +         ldy #$7f
 +         lda #$01
 +:l1      sta bitp,y
 +         cmp #$80
 +         rol
 +
 +         pha
 +         tya
 +         lsr
 +         lsr
 +         lsr
 +         lsr
 +         sta bitphi,y
 +         lda #00
 +         sta bitp+128,  ;0 = dont draw
 +         ror
 +         sta bitplo,y
 +
 +*
 +* Extend sin table to 0..31
 +*
 +         lda sin0,y
 +         sta sin0+17,x
 +                          ;sta sin0+32,x
 +         inx
 +
 +         pla
 +
 +         dey
 +         bpl :l1
 +                          ;.y=$ff
 +                          ;.x=$10
 +
 +*
 +* Set up sin tables
 +*
 +InitRot  
 +
 +         iny
 +         sty p6000        ;rats :(
 +
 +                          ;         ldy #00      ;rats :(
 +                          ;
 +                          ;         sty temp
 +                          ;:l3      ldx rottemp
 +                          ;         lda sin0,x
 +                          ;         sta aux      ;sin(theta)
 +                          ;
 +                          ;:l4
 +                          ;
 +                          ;         clc
 +                          ;         jsr MULT8    ;.Y * AUX
 +
 +                          ;
 +                          ; AUX*.Y -> ACC,.A (low,hi) 16-bit result
 +                          ; AUX, .Y unaffected
 +                          ; .Y can be negative
 +                          ;
 +
 +:Mult8   
 +         sty temp
 +
 +                          ;dumb multiply
 +                          ;         sty acc
 +                          ;         lda #00
 +                          ;         tay
 +                          ;:mloop   clc
 +                          ;         adc sin0-$10,x
 +                          ;         bcc :skip
 +                          ;         iny
 +                          ;:skip    dec acc
 +                          ;         bne :mloop
 +                          ;         tya
 +
 +         sty acc
 +         lda #00
 +         ldy #9
 +         clc
 +:mloop   
 +         ror
 +         ror acc
 +         bcc :mul2
 +                          ;         clc
 +                          ;dec sin0 table by 1 instead
 +         adc sin0-$10,x
 +:mul2    dey
 +         bne :mloop
 +
 +         ldy temp
 +         bpl :pos
 +         sec
 +         sbc sin0-$10,x
 +:pos     
 +
 +         sta (p4000),y
 +         eor #$ff
 +         clc
 +         adc #1
 +
 +         STA (p6000),y
 +         iny
 +         bne :mult8
 +         inx
 +         inc p4000+1
 +         inc p6000+1
 +         bpl :mult8       ;to $8000
 +
 +                          ;.x=$20
 +                          ;.y=0
 +
 +*
 +* Set up Screen
 +*
 +
 +
 +                          ;         lda #00      ;bah!
 +                          ;         tya
 +:l2a     
 +                          ;         pha
 +         jsr $e8ea        ;scroll up
 +                          ;         pla
 +                          ;.A = ($AC)
 +                          ;.X=00
 +
 +         clc              ;necessary?
 +:l2                       ;set one line
 +         sta $0720+12,x
 +         inc $db20+12,x
 +         inx
 +         adc #16
 +         bcc :l2
 +                          ;         adc #$00
 +         inc $ac
 +
 +         cmp #15
 +         bne :l2a
 +:done                     ;.X=$10
 +
 +         stx $40          ;rats :(
 +                          ;init Py
 +
 +*
 +* Main program loop:
 +*   - get input
 +*   - update positions/angles
 +*   - render
 +*   - swap buffers
 +*
 +
 +MainLoop 
 +
 +                          ; Clear buffer and swap
 +
 +         lda base
 +         eor #$08
 +         sta base
 +
 +         sta zpoint+1
 +         ldy #00
 +         tya
 +         ldx #8
 +:l0      sta (zpoint),y
 +         iny
 +         bne :l0
 +         inc zpoint+1
 +         dex
 +         bne :l0
 +
 +                          ; Get Input
 +                          ; .X=0
 +
 +         lda $dc00
 +         lsr
 +         lsr
 +         lsr
 +         bcs :c1
 +         inc theta
 +:c1      lsr
 +         bcs :c2
 +         dec theta
 +:c2      and #$01
 +         sta tagged
 +         eor #$01
 +         ora $23c0+6
 +         sta $23c0+6      ;shot fired
 +
 +
 +
 +                          ;         lda $cb
 +                          ;         and #3
 +                          ;         beq :skip
 +                          ;         lsr
 +                          ;         bcc :inc
 +                          ;         dec theta
 +                          ;         lsr
 +                          ;        bcc :skip
 +                          ;         jsr Forwards
 +                          ;:inc     inc theta
 +                          ;:skip
 +
 +ObjLoop  
 +
 +                          ; Compute relative center
 +
 +         inx
 +         stx curobj
 +
 +         dec cenz,      ;Update pos
 +
 +         lda cenx,x
 +                          ;         sec
 +                          ;         sbc cenx
 +         sta Px+4
 +         lda cenz,x
 +                          ;         sbc cenz
 +         sta Pz+4
 +
 +
 +                          ; Rotate
 +Rot      
 +
 +         lda theta,x
 +         adc theta
 +         sta angle
 +
 +         lda theta
 +         ldx #4
 +         ldy #STA
 +         jsr RotProj
 +         lda cz
 +         sbc #14
 +                          ;         bmi :skip
 +         cmp #125
 +         bcs NoDraw
 +         cmp cx
 +         bmi NoDraw
 +         adc cx
 +         bmi NoDraw
 +
 +                          ; If in view, rotate project & plot
 +
 +         dex
 +         stx count
 +:loop    lda angle
 +         ldy #ADC
 +         jsr RotProj
 +         sta YCoord,X
 +         dex
 +         bpl :loop
 +
 +                          ; Plot
 +
 +:ploop   ldy count
 +:l2      ldx count
 +         dey
 +         tya
 +         pha
 +         jsr DrawLine
 +         pla
 +         tay
 +         bne :l2
 +         dec count
 +         bne :ploop
 +                          ;.y=0
 +NoDraw   
 +         ldx curobj
 +
 +                          ; Update
 +
 +         lda tagged,x
 +         beq :zip
 +         lda cx
 +         ora tagged
 +         sta tagged,x
 +         dfb $2c
 +:zip     inc theta,x
 +
 +
 +         cpx #NUMOBJS
 +         bcc ObjLoop
 +
 +         jsr $eb59        ;Swap buffer
 +                          ;.Y=1 .A=$7F
 +
 +         jmp MainLoop
 +
 +
 +*
 +* Line routine
 +*  - Draws from L to R
 +*  - Forces dx>0 (x1 < x2)
 +*  - Numbers are +128 offset, to allow
 +*  - DX and DY to be 0..255 for long lines
 +*  - Uses same core routine for stepinx
 +*    and stepiny by changing variables
 +*
 +* On input:
 +*  .X = index into point list, point 1
 +*  .Y = index of point 2
 +*
 +
 +Pswap    
 +         sty temp
 +         txa
 +         tay
 +         ldx temp
 +DrawLine 
 +         lda xcoord,y
 +         sec
 +         sbc xcoord,x
 +         bcc Pswap        ;x2 >= x1
 +         sta dx
 +         lda ycoord,y
 +         sbc ycoord,    ;dy = y2-y1
 +         bcs :posy
 +         ldy #DEY
 +         eor #$ff
 +         adc #1
 +         dfb $2c
 +:posy    ldy #INY
 +         sta dy
 +         cmp dx
 +         lda #INX
 +                          ;         bcs stepiny
 +         bcs :noswap
 +         tya
 +         ldy #INX
 +:noswap  sty mod5
 +         sta mod4
 +stepinx  
 +                          ;         sta mod5
 +                          ;         sty mod4     ;INY/DEY
 +         lda dy
 +         ldy dx
 +         bcc mod
 +stepiny  
 +                          ;         sta mod4
 +                          ;         sty mod5     ;iny/dey
 +         lda dx
 +         ldy dy
 +mod      
 +                          ;         sty mod1+1
 +         sty temp
 +         sty mod3+1
 +         sta mod2+1
 +
 +Draw     
 +         lda ycoord,x
 +         sec
 +         sbc #64
 +         tay
 +         lda xcoord,x
 +         sec
 +         sbc #64
 +         tax
 +                          ;mod1     lda #dy
 +         lda temp
 +         beq done         ;no steps!
 +                          ;         sta temp
 +         lsr
 +plot     
 +         pha
 +         tya
 +         bmi calcline
 +         lda bitphi,x
 +         ora base
 +         sta point+1
 +         lda bitplo,x
 +         sta point
 +         lda bitp,x
 +         beq calcline
 +         ora (point),y
 +         sta (point),y
 +calcline 
 +         pla
 +mod2     sbc #dx
 +         bcs mod5
 +mod3     adc #dy
 +mod4     inx
 +mod5     iny
 +         dec temp
 +         bne plot
 +done     
 +         rts
 +
 +*
 +* Rotate and project point
 +*
 +* On entry:
 +*  .X = index into point list
 +*  .A = rotation angle
 +*  .Y = ADC/STA for points/centers
 +*  CX,CZ = location of object
 +*
 +* On exit:
 +*  Points stored in xcoord,x ycoord,x
 +*
 +RotProj  
 +         AND #$3F
 +         ORA #>SINTAB
 +         STA sin+1
 +         SBC #$2F         ;-3*pi/4
 +                          ;         ADC #$10
 +                          ;         AND #$3F
 +         ORA #>SINTAB
 +         STA cos+1
 +         STY RotM1
 +         STY RotM2
 +RotProj2 
 +         LDY Pz,X
 +         LDA (sin),Y
 +         PHA
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 +         CLC
 +RotM1    ADC CZ
 +         STA aux
 +
 +                          ;         LDY Px,X
 +                          ;         LDA (cos),Y
 +                          ;         LDY Pz,X
 +                          ;         CLC
 +                          ;         ADC (sin),Y
 +         PLA
 +         CLC
 +         ADC (cos),Y
 +         CLC
 +RotM2    ADC CX
 +         JSR Div8
 +         STA XCoord,X
 +
 +         LDA Py,X
 +                          ;         JSR Div8
 +                          ;         STA YCoord,X
 +                          ;         RTS
 +
 +*
 +* Signed division routine
 +*
 +* 64*.A/AUX -> .A
 +* Only valid for .Y/AUX < 1
 +* .A pos or neg, AUX > 0
 +* AUX, .X unaffected
 +*
 +
 +DIV8     
 +                          ;         STY ACC
 +                          ;         tya
 +         php
 +         bpl :pos         ;.A just loaded
 +         eor #$ff
 +:pos     sta acc
 +         LDA #0
 +         LDY #14
 +:DLOOP   ASL ACC
 +
 +         ROL
 +         CMP AUX
 +         BCC :DIV2
 +         SBC AUX
 +         INC ACC
 +:DIV2    
 +         DEY
 +         BNE :DLOOP
 +                          ;RTS
 +
 +         lda acc
 +         plp
 +         bpl :pos2
 +         eor #$ff
 +:pos2    
 +         eor #$80         ;+128 offset
 +         rts
 +
 +*
 +* Point list
 +* Last point is for relative cx,cz
 +*
 +
 +                          ;Pz       dfb 0,16,0,-16,0
 +                          ;Py       dfb 16,0,8,0
 +Px       dfb 0,-4,16,-4
 +
 +                          ;note: px+4 intrudes on table below
 +
 +*
 +* 128*sin(t), t=0..15
 +*
 +* Table must be at end of code!
 +*
 +
 +                          ;sin0     dfb 0,13,25,37,49,60,71,81
 +                          ;         dfb 91,99,106,113
 +                          ;         dfb 118,122,126,127,128
 +
 +                          ;sin0     dfb 0,25,50,74,98,120,142,162
 +                          ;         dfb 180,197,212,225,236,244,250,254,255
 +sin0     dfb 0,24,49,73,97,119,141,161
 +         dfb 179,196,211,224,235,243,249,253,255
 +
 +
 +end      
 +
 +.......
 +....
 +..
 +.                                    - fin -
 +</code>
magazines/chacking21.txt · Last modified: 2015-04-17 04:34 (external edit)