User Tools

Site Tools


magazines:chacking21

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

magazines:chacking21 [2015-04-17 04:34] (current)
Line 1: Line 1:
 +<​code>​
 +                   ########​
 +             ##################​
 +         ###### ​           ######
 +      #####
 +    #####  ####  ####      ##      #####   #### ​ ####  ####  ####  ####   #####
 +  #####    ##    ##      ####    ##   ## ​  ## ​ ###     ## ​   ####  ##   ## ​  ##
 + ##### ​   ######## ​    ## ​ ##   ## ​       #####       ## ​   ## ## ##   ##
 +#####    ##    ##    ######## ​ ##   ## ​  ## ​ ###     ## ​   ##  ####   ## ​  ##
 +#####  ####  ####  ####  ####  #####   #### ​ ####  ####  ####  ####   ######​
 +#####                                                                    ##
 + ###### ​           ###### ​          Issue #21
 +   ################## ​             Feb 5, 2002
 +       ########​
 +     Special Focus on Minigames
  
 +...............................................................................
 +
 + "Do not meddle in the affairs of wizards, for they are subtle and
 + quick to anger."​ -- Ancient Elvish saying
 +
 +...............................................................................
 +
 +BSOUT
 +
 +(Why yes, I _am_ re-reading Lord Of The Rings, how did you know?  Sure, the
 +movie is pretty good, not the same as the book, but probably the best you
 +could do, although why does every Hollywood evil creature essentially come
 +from Night of the Living Dead?  But hey, the elves actually looked like elves.
 +Black Hawk Down is really good too, and... oh, sorry!)
 +
 +Hoo-ah! ​ Welcome to another issue of C=Hacking!
 +
 +There'​s lots of nifty stuff to get to so this will be brief. ​ First, the
 +"​Hacking Exchange"​ is now up at the Official Unofficial C=Hacking Homepage:
 +
 + http://​www.ffd2.com/​fridge/​chacking/​
 +
 +It's just a simple message board, with the idea that C=Hacking types can
 +make comments, ask questions, and otherwise talk to one another. ​ Check
 +it out!
 +
 +Second, if you have any projects you're working on... please contact me,
 +and consider writing them up for C=Hacking!
 +
 +And finally, something I think you'll enjoy:
 +
 +
 +http://​groups.google.com/​groups?​hl=en&​th=87c2a2ced7e32ce1&​rnum=42
 +
 +From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu)
 +Subject: C= Hacker'​s Net-Mag
 +Newsgroups: comp.sys.cbm
 +Date: 1992-02-08 14:52:20 PST
 +
 +
 +  Due to the recent influx of "​Tech-Know"​ (People who actually understand /
 +hack te C=64 into doing stuff that was previously unknown) I am going to
 +tentativly start a C= Hackers Net-Mag (Hacking in the 90's since of the word,
 +not the 80's).
 +  So - please send me your netmail address if you're interested in receiving
 +it.
 +  Or if you are interested in contatcting me concerning an article etc that
 +you'd like to see distributed please also contact me.
 +
 +For the first issue I'm tentatively planning on:
 +  - Line Drawing on 8563 VDC (640 x 200 hi-res graphics)
 +  - Beginning ML column
 +  - Raster Article
 +  . . .
 +
 +- Craig Taylor
 +duck@pembvax1.pembroke.edu
 +
 +P.S. - Not sure when the first is gonna go out but hopefully soon.
 +
 +
 +From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu)
 + ​Subject:​ Issue 1 - C= Hacking Available
 + ​Newsgroups:​ comp.sys.cbm
 + Date: 1992-02-27 17:22:44 PST
 +
 +
 +Issue 1 of C= Hackers is now available via NETMAIL and is a compilation of
 +several articles on the tehnical side of the Commodore 64 and 128.  For those
 +of you who missed the first posting, please reply via email and ask to be put
 +on the list.  The first issue had programming in ml, documented and
 +undocumented 6502 opcodes, and a line-drawing package in machine language for
 +the C=128 hi-res screen.
 +  Issue 2 will be coming out in a month or so.  Many thanks for the bandwith.
 +
 +- Craig Taylor
 +duck@pembvax1.pembroke.edu
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +:::::::::::::::::::::::::::::::::::​ Contents ::::::::::::::::::::::::::::::::::​
 +
 +BSOUT
 + o Voluminous ruminations from your unfettered editor.
 +
 +Jiffies
 + o News, rumours, and stuff.
 +
 +
 +Side Hacking
 +
 + o "Pulse Width Modulation, continued"​ by various.
 +   Tying up some loose ends from last issue'​s digi article.
 +
 + o "​Introducing Full-Screen IFLI mode with a SuperCPU",​ by Todd
 +   Elliot <​eyethian@msn.com>​ (Hey, what's with all this MSN crapola? :)
 +   Using a SuperCPU, it is possible to use the first three columns
 +   of an (I)FLI picture, and Todd shows how.
 +
 +
 +Main Articles
 +
 + o "​VIC-20 Kernel ROM Disassembly Project, part IV", by Richard Cini
 +   <​rcini@msn.com>​
 +
 +   And now it's time to start on that most frightening of creations:
 +   the tape drive code!
 +
 +
 + o "The Art of the Minigame"​ -- an article in six parts:
 +
 +   Introduction,​ by the editor
 +   Part 1: Codebreaker,​ by David Holz <​whiteflame52@yahoo.com>​
 +   Part 2: TinyPlay, by S. Judd <​sjudd@ffd2.com>​
 +   Part 3: MagerTris, by Per Olofsson <​magervalp@cling.gu.se>​
 +   Part 4: Compressing Tiny Programs, by S. Judd <​sjudd@ffd2.com>​
 +   Part 5: TinyRinth, by Mark Seelye <​mseelye@yahoo.com>​
 +   Part 6: Tetrattack!,​ by Stephen Judd <​sjudd@ffd2.com>​
 +
 +   ​
 +.................................. Credits ...................................
 +
 +Editor, The Big Kahuna, The Car'​a'​carn..... Stephen L. Judd
 +C=Hacking logo by.......................... Mark Lawrence
 +
 +Special thanks to the cbm-hackers for many otherwise unacknowledged
 +contributions.
 +
 +Legal disclaimer:
 + 1) If you screw it up it's your own fault!  ​
 + 2) If you use someone'​s stuff without permission you're a dork!
 +
 +For information on the mailing list, ftp and web sites, send some email
 +to chacking-info@jbrain.com.
 +
 +
 +................................... Jiffies ..................................
 +
 +$01 Jochen Adler has made a program that reads the second side of
 +    a 1541 disk in a 1571 - without turning the disk over. It reads the
 +    blocks from end to beginning. Because of the mechanical bump however,
 +    it can only read tracks 5 to 35. If anybody wants this program please
 +    e-mail Jochen (NLQ@gmx.de)
 +
 +$02 Soci/​Singular has been working on a commented C128 ROM listing. ​ Check
 +    out this great effort at http://​singularcrew.hu/​c128rom/​
 +
 +$03 64net/2 has been updated:
 +
 +    For those that do not know: 64net/2 is yet another PC-to-C64/​128 User<​->​LPT
 +    parallel cable software. It supports d64/​d71/​d81/​t64/​lnx/​dhd disk images,
 +    raw files and own Internet partition. It is possible to enter disk images
 +    like any other directory. Client programs are provided for C64 and C128.
 +    The next goal will be to patch ROM instead of loading client.
 +
 +    A small BASIC example program is included that is able to send e-mails
 +    through 64net/2 host or any other machine if its IP is known. How many
 +    e-mail agents are there for C=? How many of them are written in plain
 +    BASIC 2.0? :)
 +
 +    http://​sourceforge.net/​projects/​c64net/​ contains the latest version, and
 +
 +    http://​venus.wmid.amu.edu.pl/​~ytm/​64net2win.tar.gz contains a Windows
 +    binary.
 +
 +$04 CC65 is up to version 2.7.0, with many new improvements. ​ Check it out
 +    at http://​www.cc65.org/​
 +
 +$05 Moreover, Ullrich has set up his C64 as a web server at
 +
 + http://​c64.cc65.org/​
 +
 +    The web server runs on a stock 64, using a Swiftlink for communications,​
 +    and uses the uIP TCP/IP stack written (with cc65) by Adam Dunkels
 +
 + http://​dunkels.com/​adam/​uip
 +
 +    Pretty cool, eh?
 +
 +$06 A new IDE interface has become available:
 +
 +    Elysium is proud to announce a new software and hardware solution for
 +    your Commodore mass data storage needs.
 +
 +    The CIA-IDE is yet another approach to connect an IDE hard drive to 
 +    C64/​128. ​ It differs from previous similar projects in these areas:
 +
 +    - it is the simpliest one (only two chips required (or just one in case of
 +      C128))
 +    - it is free (documentation and software),
 +    - the software is available in source code form under GNU GPL,
 +    - there _exists_ a ready to use GEOS 2.0 driver.
 +
 +    Documentation,​ source codes, and binaries are at:
 +
 + ftp://​ftp.elysium.pl/​groups/​Elysium/​Projects/​ciaide/​
 +
 +$07 Marko Makela has developed a tape drive emulator with an RS-232 port,
 +    allowing transfers between any computer with a tape port and any
 +    computer with an RS-232 interface (e.g. a PC or Swiftlink). ​ The hardware
 +    and software is at
 +
 + http://​www.funet.fi/​pub/​cbm/​crossplatform/​transfer/​C2N232/​
 +
 +$08 GoDot -- the C64 image processing program -- is now public domain. ​ Arndt
 +    will still be working on it, but it's now available at
 +
 + http://​members.aol.com/​howtogodot/​godnews.htm
 +
 +$09 The C64 is now listed in the Guiness Book of World Records; a scan of
 +    the page (from Robert Bernardo, posted by Frank Michlick) is at
 +
 + www.cupid.de/​upload/​famous.jpg
 +
 +    (love that line about "16K sound"​).
 +
 +    Also, I highly recommend taking Robert'​s advice and checking out the
 +    infinitely cool "​Logo-Matic"​ on the main www.cupid.de site!
 +
 +$0A JOS just keeps cruising along, with lots of changes. ​ Among the biggest
 +    changes, of course, is the announcement that JOS will be merged with
 +    Clips, with the new system to be called "​Wings"​ (with some of the letters
 +    capitalized and some not; I never remember that stuff). ​ For the latest
 +    JOS news, check out
 +
 + http://​www.jolz64.cjb.net/​
 +
 +$0B Frank Kontros has made a commented disassembly of the C64 ROM, with the
 +    BASIC ROM on the way:
 +
 + http://​c64.uz.ua/​sources/​C64_Kernal_Disassembly.zip
 +
 +
 +$0C And finally, Aleksi Eeben, author of two minigames, has now written a
 +    VIC-20 game:
 +
 + http://​www.student.oulu.fi/​~aeeben/​download/​dragonwing.zip
 +
 +    more screenshots:​
 +
 +        http://​www.student.oulu.fi/​~aeeben/​screen1.png - screen4.png
 +
 +    Aleksi'​s homepage is at http://​www.cncd.fi/​aeeben
 +
 +    Neat!
 +
 +................................ Side Hacking ................................
 +
 +
 +Pulse Width Modulation, continued
 +--------------------------------- from various
 +
 +The digi article in issue #20 of C=Hacking left a few loose ends, and
 +generated some followups.
 +
 +First, Otto Jarvinen (sounddemon) emailed to say that the SID detection
 +routine occasionally reported incorrect results for him, and suggested that
 +a workaround was to do the detect several times. ​ YMMV!
 +
 +Second, a day or two after issue #20 was released, Levente discovered a
 +brilliant way to play 6-bit PWM digis on a stock machine:
 +
 +--
 +I couldn'​t resist, and tried something out (see attachment). It works!!! :-)
 +
 +In fact, when I wrote the last letter I didn't know that I found something
 +useable, just had some ideas - I felt that I'm at the right place. When I read
 +C=H 20 this morning and read your comment about the Test bit (from the PRG), I
 +knew that it must work. All I had to do is then to put this idea into code.
 +
 +The whole idea is about starting the pulse by software, and then having the
 +SID turn it back to 0 after a time.
 +
 +Is it possible? ...The keys are the Test bit (the SID wave counter can be
 +reseted anytime), the pulse width register, the wave counter and the SIDs way
 +of generating pulse wave. (Ie. the pulse wave is high, as long as the wave
 +counter is less than the value in the pulse width register).
 +
 +
 +Check this algorithm:
 +
 +- Init: volume at max, voice 1 sustain level max, start attack. Freq is
 +selected well (=$4000), so the wave counter is incremented by 4 every
 +processor clock cycles.
 +
 +Loop:
 +- load next sample value, and put it to the pulse width low register ($d402;
 +ensure that $d403 is 0).
 +- Set test bit, and clear test bit (counter reset).
 +- Increase sample pointer, some delay, then loop. The delay must be 64 clock
 +cycles + the time while the Test bit is kept set (4 cycles if using STA $d404
 +: STX $d404 immediately with pre-loaded values).
 +
 +What will happen? The 8-bit sample value is put directly to the pulse width
 +register (MSBs of the pulse width register are cleared!...). The wave counter
 +is started (release test bit), and it increases 4 by every CPU cycles (=
 +counts 256 in 64 cycles). After some time, the counter will reach the value in
 +the pulse width register. This happens in exactly after (8-bit sample value /
 +4) cycles, because of the above. In this cycle (or the next?...) the SID turns
 +its pulse output to 0. Voilá!
 +
 +One must just make sure that the loop length in cycles matches the above
 +conditions, and then it runs like hell... Since it does exactly the same on
 +the SID as the other (bit-banging) way, it just does it with some hardware
 +help, there'​s also no problem with the 4khz maximum barrier (since the
 +oscillator is reset every loop).
 +
 +With little enhancement,​ it's possible to write an about 7.5 bits player for a
 +stock C64 by this method. This is what you find in the attachment... The idea
 +is using all the 3 channels simultaneously. A slightly increased sample value
 +is written to the three pulse width registers, so the oscillators will finish
 +the duty cycle one processor cycle later, when there'​s a carry between
 +bits(0,1) to the MSBs.
 +
 +The replay freq is the CPU clk / 68 (~15khz). 64 cycles (variable duty cycle)
 ++ 4 cycles (constant duty cycle because of the reset time - no problems with
 +that, it doesn'​t change (just gives a small constant DC...)).
 +
 +By similar methods, it should be possible to write a sample player with higher
 +PWM freq (with less resolution of course, but eliminating this still audible
 +whistling).
 +
 +(I tried using the filter to reduce it, but it sounded so bad that I left it
 +out. It clicked like hell. The FETs got saturated.)
 +
 +[Richard Atkinson suggested turning down the sustain volumes to avoid this]
 +
 +See the attachment, and the binary. I think the sample sounds pretty good :-).
 +(The cut is from '​Greece 2000' by Three drives on a vinyl).
 +
 +(Another idea that popped up in my mind: since the TED sound generator can
 +also be reset, I could probably translate this idea to the Plus/4 :-O ).
 +
 +Best regards,
 +
 +Levente
 +
 +--
 +
 +The binary is available at http://​www.ffd2.com/​fridge/​chacking/​ towards the
 +bottom of the page.
 +
 +Third, I received a very interesting email from an Apple-II guy, which I'd
 +like to pass on:
 +
 +--
 +
 +Hi!
 +
 +I found your page as I was searching for something else 6502-related,​
 +and was very interested. ​ Although I have always been aware of the
 +C64, I have never really been a user--I have used Apple II's since 1980.
 +
 +I was particularly interested in the article on playing "​digis"​ on the
 +C64.  I became interested in playing digitized sounds on the Apple II
 +in 1993, after hearing a 3-bit, 11.025 KHz PWM player. ​ At 3 bits, you
 +can imagine how noisy speech samples were, but the overall effect
 +for a 1 MHz machine with a 1-bit speaker "​toggle"​ was amazing. ​ It
 +made me wonder how far this PWM technique could be pushed on a
 +stock, 1 MHz Apple II (not the somewhat faster, 65816-based IIgs).
 +
 +The short answer is, much farther than I expected! ​ Robin and Stephen
 +accurately describe the theoretical PWM limit as 6 bit samples at
 +about 16 KHz for a stock 1 MHz machine, but, as they point out,
 +that is not practically realizable for a number of reasons, unless the
 +play loop is completely unrolled!
 +
 +Furthermore,​ in the Apple II world, sampled sounds have acquired a
 +few standardized sampling rates--mostly as a result of Mac influence,
 +which was in turn influenced by CD'​s. ​ The most common rate in the
 +Apple II world is 11.025 KHz, or one-fourth of the audio CD sampling
 +rate.  This is commonly considered to be "AM radio quality",​ with a
 +Nyquist bandwidth of about 5.5 KHz and a practical bandwidth of
 +4+ KHz, given practical anti-aliasing filters (at the sampling end, not
 +the playback end).
 +
 +A frequency of 11.025 KHz is, though high, still painfully audible to
 +people whose ears are not zonked--a piercing "​squeal"​ running
 +through every sound. ​ So even though it is possible to write a
 +practical 6-bit 11.025 KHz PWM player (usually called a SoftDAC
 +in the Apple II world), the resulting listening experience is disappointing.
 +
 +So I went to work on a way to do 2x oversampling,​ and built a 5-bit
 +22.050 KHz PWM player. ​ It was sad to lose a bit, but the absence
 +of any audible "​carrier"​ more than compensated for it!
 +
 +If you have access to an 8-bit Apple II (preferably with lower case,
 +like a //e), and also preferably with a way of attaching an external
 +speaker or headphones in place of the miserable 2.75" internal
 +speaker, then you can easily give it a try and judge for yourself.
 +
 +I'm pretty proud of the novel design of the code, which I would
 +characterize as "​vectored"​ unrolled loops, one for every two
 +pulse duty cycles, which I wrote a BASIC program to write
 +for me--much less painful for counting cycles!
 +
 +The package is available on the web at:
 +
 +http://​members.aol.com/​MJMahon/​index.html
 +
 +and is called <A HREF="​http://​members.aol.com/​MJMahon/​sound22.shk">​Sound Editor v2.2</​A>,​ since I had to "dress up" the player
 +into something fun to play with.  ;-)  An earlier version of Sound Editor
 +was published on SoftDisk in 1994, IIRC, but this one is a little more
 +evolved. ​ It also introduced 2:1 ADPCM compression of 8-bit sampled
 +sounds, to save disk space. ​ It is a lossy compression,​ but not very
 +noticeably. ​ The editor package also includes those routines, in 6502
 +assembly code.
 +
 +All of this should be trivially adaptable to the stock, 1 MHz C64, with
 +very good results. ​ By using the filters, you could probably filter out
 +the 11.025 KHz carrier and return to 6-bit accuracy!
 +
 +I should note that in the Apple world, sampled sounds are usually
 +represented as "​excess-128"​ codes, which means that the sign bit
 +is inverted. ​ This actually simplifies things, since the sample value
 +is within a few shifts of being the pulse width in cycles.
 +
 +Let me know what you think!
 +
 +-michael
 +
 +--
 +
 +(Always great to hear from Atari and Apple ][ folks!)
 +
 +And finally, I have a little mathematical analysis of PWM and how it compares
 +to a "​straight"​ digi.  Basically, I found some of the PWM explanations a
 +little unconvincing in issue #20 (even though I wrote them!). ​ For example,
 +the idea of "​average voltage"​ seems a little funny, since every two samples
 +has an "​average voltage",​ as does every four, etc. but that set of average
 +voltages would give a different sounding signal than the original (or
 +more dramatically,​ there is an average voltage over a full second of digi
 +playback, but that's not what you hear!). ​ So I wanted to know how a
 +PWM signal _really_ compares to a straight digi playback.
 +
 +Another issue is changing the amplitude of a PWM digi, i.e. using two
 +pulse waveforms, with one 1/16 the value of the other, to get higher
 +resolution. ​ If you recall the discussion of digis, the resolution of a PWM
 +digi depends on the number of pulse widths available, not the amplitude.
 +Adding two PWM waveforms together does not change the number of pulse widths
 +available, so I wanted to figure out what changing the amplitude _really_
 +does to a PWM digi, and if it can really be exploited.
 +
 +And finally, I wanted to know about the carrier wave (that is so piercing
 +at lower playback frequencies) -- and once again, how it compares with a
 +standard digi (which, after all, is stair-stepping the voltages at the
 +playback rate).
 +
 +Since the rest of this article is some Fourier analysis that 99% of people
 +will have zero interest in, I'll put the conclusions here.  The first is:
 +PWM digis and standard digis are essentially identical except at higher
 +frequencies (except for a phase shift, which doesn'​t make any difference to
 +your ear).  The second is: changing the amplitude of a PWM changes the 
 +resolution. ​ More specifically,​ the amplitude of the pulse multiplies the
 +digi sample value. ​ If two pulses can be synced close enough, it should
 +indeed be possible to use two pulses to get a higher resolution. ​ Moreover,
 +by modulating the amplitude of a single PWM digi, using the $d418 volume
 +register -- that is, using PWM _and_ $d418 -- it should be possible to get a
 +higher dynamic range, something that should be a little more achievable using
 +SID (but maybe not that useful, so I didn't try it out).  And finally, a
 +standard digi has zero amplitude at the carrier frequency.
 +
 +In other words, after a lot of effort I was able to demonstrate what everyone
 +already knows.
 +
 +The analysis doesn'​t change anything from the previous articles (except
 +possibly the idea for changing the PWM amplitude to get more dynamic range).
 +
 +And now, some Fourier analysis. ​ A standard digi just sets the voltage to
 +the sample value s_j, for a length of time dt (dt = 1/sample rate). ​ The
 +Fourier transform of a single sample s_j (occuring at time t_j) is
 +
 +
 + s_j [e^(-iw dt) - 1] * [e^(-iw t_j) / -iw]
 +  
 +
 +where w = angular frequency. ​ Since the above is a little hard to read, I'll
 +say it in words. ​ The first term is the sample value s_j, which scales
 +amplitudes at all frequencies. ​ The second term is due to the finite length
 +of the pulse (evaluating the Fourier integral at the boundaries),​ and
 +basically changes the phase of the transform. ​ The third term is like
 +sin(w)/w -- a sinusoid with decreasing amplitude as frequency increases.
 +So: the transform goes like sin(w)/w times the sample value, with some phase
 +effects thrown in (we'll get back to these in a moment).
 +
 +A PWM digi sets the duty cycle of a pulse to the sample value s_j, giving
 +a Fourier transform of
 +
 + [e^(-iw s_j dt) - 1] * [e^(-iw t_j) / -iw]
 +
 +Compare this with the earlier expression, and you'll see that the sample
 +value s_j has moved up in to the exponent of the "phase term" but that
 +they'​re otherwise the same.
 +
 +The first thing to do is to show that both expressions,​ PWM and standard,
 +reduce to the same thing -- that is, that a PWM and a standard digi sound
 +the same!  The expressions both decrease as 1/​frequency,​ due to the
 +sin(w)/w term.  This means that at large frequencies the values become
 +negligible. ​ (How large? ​ For example, if the sample frequency is just 1KHz,
 +then sin(w)/w is .001 times smaller near w=1KHz (i.e. the sample frequency,
 +which is twice the Nyquist limit) than it is near w=0).
 +
 +So now consider the phase terms for small w.  The Taylor expansion for e^x is
 +
 + 1 + x + x^2/2 + ...
 +
 +We can therefore expand the "phase terms" as
 +
 + regular: e^(-iw dt) - 1 = (1 - iw*dt + w^2 dt^2/2 + ...) - 1
 + = -iw*dt + O(w^2 dt^2)
 +
 + pwm: e^(-iw s_j dt) - 1 = -iw*s_j*dt + O(w^2 dt^2)
 +
 +where O(w^2 dt^2) is considered very small since w and dt are both small.
 +Substituting the above into the original expressions gives
 +
 + s_j*iw*dt [e^(-iw t_j) / iw]
 +
 +in both cases. ​ That is, we have shown that for "​small"​ frequencies -- more
 +specifically,​ for frequencies where (w^2*dt^2) is much smaller than (w*dt),
 +which is where w*dt<1, which is frequencies less than the sample frequency,
 +which is all frequencies of interest! -- PWM and standard digis are the same.
 +
 +The explanation lies in the phase terms. ​ Those "phase terms"
 +
 + [e^(iw dt) - 1]  (regular)
 +
 +and
 +
 + [e^(iw s_j dt) - 1]  (PWM)
 +
 +do more than just change the phase. ​ When they multiply the sin(w)/w signal,
 +they take the sin(w)/w signal, change the phase, and then subtract the
 +sin(w)/w signal again. ​ It's this difference of signals that makes things
 +work out at the frequencies we care about. ​ PWM and standard digis are _not_
 +the same, but the main differences are at higher frequencies,​ where the
 +amplitudes are in general much smaller.
 +
 +But... but... what about the PWM carrier frequency? ​ If we take a constant
 +digi, say with sample values = 1/2, the standard digi gives a constant
 +voltage, whereas a PWM digi gives a square wave at the sample frequency.
 +The answer comes from the "phase terms" above. ​ The sample frequency is
 +
 + w = 2*pi/dt.
 +
 +Substituting this into the phase terms gives
 +
 + [e^(i*2*pi) - 1] (regular)
 +
 +and
 +
 + [e^(i s_j 2*pi) - 1] (PWM)
 +
 +The regular expression is exactly zero -- there is _nothing_ at the
 +sample frequency of a regular digi.  But that's not the case for the PWM
 +term, because of the s_j up in the exponent. ​ PWM digis have a _finite_
 +amplitude at the carrier frequency. ​ Note that because of the sin(w)/w
 +term it gets smaller as the sample frequency increases -- but it isn't zero.
 +
 +Finally, the phase term expansions give some insight into what happens
 +when both the pulse width _and_ height are varied. ​ If the pulse width
 +is s_j, and the height is set to h_j, then the Fourier transform becomes
 +
 + h_j*s_j *iw*dt [e^(-iw t_j) / iw]
 +
 +That is, the amplitude multiples the width. ​ For the case of adding two
 +PWM waves together, then, the amplitude really does effectively scale the
 +sample value, and it should be possible to add one PWM value at 1/16 the
 +amplitude of another to get an effective 8-bit value.
 +
 +What about _varying_ the amplitude of a single PWM sequence? ​ For a 6-bit PWM
 +digi, say, the sample values s_j can go from 0 to 63.  If this is then
 +multiplied by h_j=2 say, then the values become 0 2 4 ... 126 -- a 7-bit
 +number where the lowest bit is always 0.  What use is that?  Well, we still
 +have the h_j=1 values of 0..63, which do include the lowest bit.  So we
 +can effectively change the dynamic range from 0..63 to 0..126 using just two
 +amplitude values.
 +
 +As a practical matter, then, it might be possible to use all 15 $d018 values
 +available to get a big dynamic range, and hence a better sounding digi,
 +using fewer CPU cycles. ​ Well, ok, we're only _sort of_ changing the dynamic
 +range, so I pretty much doubt the usefulness of it.  But maybe someone out
 +there would like to give it a shot.
 +
 +All right, let's hope this closes the book on pulse width modulation for
 +digi playback!
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +..............................................................................
 +
 +Introducing Full-Screen FLI mode for the SuperCPU
 +Copyright (C) 2002 By Todd S. Elliott
 +
 +The 'FLI Bug', where the first three columns of a FLI screen are essentially
 +unusable, can be squashed with the help of a SuperCPU. ​ I won't go into great
 +detail on IFLI, as it has been well-documented elsewhere, but I'll begin with
 +a short summary to get us all up to speed. ​ I refer you to Albert '​Pasi'​
 +Ojala'​s excellent coverage of the FLI mode in C=Hacking #4. Pasi also
 +proofread this article.
 +
 +     A Three-Minute Summary of the FLI mode
 +
 +The VIC-II chip asserts a badline when it needs to access the databus and
 +fetch character data or videomatrix data. It was discovered that the VIC-II
 +chip can be manipulated by its vertical scroll register at $d011 (SCROLY) to
 +induce a badline at any given rasterline. By having a badline at every visible
 +rasterline, the program can manipulate $d018 (VMCSB) to point at the right
 +videomatrix to achieve the maximum flexibility of colors given to a
 +multi-color screen.
 +
 +Unfortunately,​ when a program forces a badline via SCROLY, the BA (Bus
 +Available) line in the computer goes high, and for three cycles the 6510/8510
 +processor has to finish its write operations or halt its read operations
 +before the BA line is released to the VIC-II chip. The maximum number of
 +successive write operations is three, hence the 3-cycle delay. It is in those
 +three cycles that the VIC-II does not fetch video matrix data to fill in the
 +first three columns and causes the 'FLI Bug'.
 +
 +I wish to stress that in those first three cycles, when the BA line is high,
 +the 6510/8510 processor is still active and can complete write operations. It
 +isn't fully shut down. After the badline retrigger at STA SCROLY, the code
 +following it is fetched on the databus and is ready to be executed by the
 +6510/8510 processor. When BA is high, the VIC-II will reference the value
 +on the databus as videomatrix data and display it in the first three columns
 +of the screen. The actual instructions that follow the STA SCROLY in the FLI
 +loop constitutes the video matrix data for the first three columns of the
 +screen.
 +
 +     Enter the SuperCPU!
 +
 + ​Normally,​ a VIC-II chip access is only possible every 4 cycles. The SuperCPU
 +can access the VIC-II chip in 1 cycle (1MHz) intervals, making cycle to cycle
 +changes possible within the VIC-II chip. More importantly,​ the SuperCPU
 +tristates the 6510/8510 processor inside the host Commodore computer (which
 +is a fancy way of saying that you can disconnect the processor from the
 +system without physically removing it).
 +
 +When a forced badline retrigger occurs with a STA SCROLY in a FLI loop under
 +the SuperCPU, the BA signal inside the host Commodore computer goes high. But,
 +the SuperCPU runs asynchronously and really doesn'​t have to pay attention to
 +the host Commodore as it runs code after the STA SCROLY. In fact, the SuperCPU
 +will execute code even if the VIC-II badline is in full swing inside the host
 +Commodore computer.
 +
 +I knew that the instruction opcodes left on the databus after the STA SCROLY
 +made up the video matrix data for the VIC-II chip for those first three
 +columns of the screen. But I wondered how this was possible in a SuperCPU
 +configuration because there would be no instruction opcodes left hanging on
 +the databus inside the host Commodore computer. After some discussions with
 +Per Olofsson ("​MagerValp"​),​ he suggested that writes/​reads to the i/o area
 +will force a value to be put on the databus.
 +
 +This is where the magic begins, when the FLI loop forces the SuperCPU to write
 +to the i/o area of the host Commodore after the forced badline retrigger at
 +STA SCROLY. The SuperCPU will note that the BA signal is still high, so it can
 +still access the databus and stash values there via DMA. This BA high signal
 +will last for 3 cycles, enough for the SuperCPU to stash three values onto the
 +databus.
 +
 +The 6510/8510 is still tristated by the SuperCPU, and there'​s nothing on the
 +databus after the forced badline retrigger at STA SCROLY. Normally, the
 +6510/8510 CPU shares the databus with the VIC-II for each machine cycle. With
 +the 6510/8510 CPU out of the equation, the SuperCPU can stash a value onto
 +this shared bus on the CPU half of this machine cycle and the VIC-II chip will
 +see it in its other half of the machine cycle.
 +
 +However, the databus is only eight bits wide. The VIC-II chip fetches video
 +matrix data and color ram data 12 bits at a time. The SuperCPU can force
 +values onto the databus during the first three cycles after the forced badline
 +retrigger, but on each cycle the last four bits belonging to Color RAM would
 +not be fed to the VIC-II chip. Only pixel values of %10 and %01 can be
 +individually selected in multicolor FLI mode, while %11 pixel values cannot be
 +individually set for those first three columns of the screen. The high
 +resolution FLI mode does not suffer from this problem because it does not use
 +color RAM for color attribute information.
 +
 +     ​Full-Screen FLI in practice
 +
 +Let's get down to the nitty gritty. The Write I/O Approach requires three
 +200-byte tables, corresponding to each column. Each value on those tables
 +correspond to each visible rasterline. For example, the first byte of each
 +table corresponds to rasterline 50, the second byte of each table corresponds
 +to rasterline 51, etc. The first table contains values needed for the first
 +column of the screen, the second table contains values needed for the second
 +column of the screen, and the third table values for the third column.
 +
 +In the FLI display loop prior to the STA SCROLY command, the current
 +rasterline is used as an index to all three tables. The values are then
 +fetched from the tables and inserted into the code that follows the STA SCROLY
 +command using self-modifying code techniques. When the STA SCROLY happens, the
 +code that immediately follows it starts writing the values onto the databus,
 +all three in a row to complete the first three columns of the screen.
 +
 +There is a disadvantage with this approach. It requires that three 200-byte
 +tables be specially constructed and stored somewhere in memory that is not
 +mirrored by the SuperCPU. A routine would have to read in a FLI graphics file,
 +extract information from the first three columns and store it into their
 +respective 200-byte tables.
 +
 +Pasi Ojala came up with a graph depicting the SuperCPU interacting with the
 +VIC-II in action, showing what happens after the forced DMA retrigger at STA
 +SCROLY. The 'LDA #$xx' would have been modified earlier in the FLI routine
 +(before the STA SCROLY) using self-modifying code. Here is the relevant source
 +code which takes up 4 machine cycles inside the host Commodore computer.
 +
 +sta scroly : abcd, d = write Y to SCROLY on 1MHz bus CPU half - Mach. Cycle #1
 +lda #$00 : ef
 +sta $d022 : ghij, j = write 1 to $D022 on 1MHz bus CPU half - Mach. Cycle #2
 +lda #$00 : kl
 +sta $d022 : mnop, p = write 2 to $D022 on 1MHz bus CPU half - Mach. Cycle #3
 +lda #$00 : qr
 +sta $d022 : stuv, v = write 3 to $D022 on 1MHz bus CPU half - Mach. Cycle #4
 +
 +There are the two shared halves consisting of a machine cycle inside the host
 +Commodore bus, and by stashing values onto the databus, this value is carried
 +over to the VIC-II half and is read as videomatrix data during the first three
 +columns of the FLI screen.
 +
 +________ = VIC-II half of the 1MHz cycle
 +. = SCPU synchronizes to 1MHz bus
 +Each char position is equivalent to a 1 (20MHz) cycle.
 +
 +Mach. Cycle #1      Mach. Cycle #2      Mach. Cycle #3      Mach. Cycle #4
 ++-------------------+-------------------+-------------------+-------------------
 +__________YYYYYYYYYY___DMA____1111111111___col0___2222222222___col1___3333333333___col2___ 1MHz
 +abc.......ddddddddddefghi.....jjjjjjjjjjklmno.....ppppppppppqrstu.....vvvvvvvvvv SCPU
 +
 +Values on the databus which is carried over onto the VIC-II half of the databus:
 +DMA: DMA condition detected by VIC-II
 +col0: colors for column 0 read, gets the value 1 put into the bus by SCPU
 +col1: colors for column 1 read, gets the value 2 put into the bus by SCPU
 +col2: colors for column 2 read, gets the value 3 put into the bus by SCPU
 +
 +     An alternative approach bites the dust
 +
 +The SuperCPU can also fetch values onto the databus by reading from the I/O
 +region. If a coder were so inclined to use a 'Read I/O Approach',​ where is a
 +program going to find 600 free bytes in the i/o region at $d000-$dfff?​ The
 +idea is to force the SuperCPU to do a read on the databus via DMA and this
 +can't be done with mirrored locations similar to the ones used in those VIC
 +optimization modes. When a SuperCPU reads a value from mirrored memory, it
 +does so from its local RAM and not the RAM that is inside the host Commodore
 +computer. However, if the SuperCPU reads from the I/O block at $d000-$dfff,​ it
 +will read a value from inside the host Commodore computer using DMA.
 +
 +Unfortunately,​ this approach did not work when the BA line went high inside
 +the host Commodore computer and is unworkable for a full-screen FLI mode. The
 +SuperCPU stops for reads if BA is high, just like its 1MHz 6510/8510
 +counterpart.
 +
 +     Other Considerations.
 +
 +There were some interesting observations while debugging the full-screen FLI
 +routines. The full-screen FLI routines were originally inspired by Robin
 +Harbron'​s IRQ-based IFLI routines. ​ Because they are driven by an IRQ, the
 +CPU is still available for normal computational tasks.
 +
 +When all three videomatrix values are written to after the STA SCROLY in the
 +line-interruptible FLI routine, the IRQ must then exit quickly with the
 +restoration of the registers. ​ It's a good idea to avoid writing to any
 +mirrored location or read/write to any I/O region ($D000-$DFFF),​ since the
 +SuperCPU will have to wait for VIC to finish with the data bus.
 +
 +Using a raster IRQ will naturally lead to trouble, since cycle-exact timing
 +is needed, so a CIA timer is used.  The timer may be set to synchronize a
 +PAL or NTSC machine. Then in the FLI routine the timer can be checked and
 +indexed into a table of preset timing values so that the forced badline
 +retrigger at STA SCROLY will always happen at the right time on the screen,
 +no matter what the SuperCPU is doing when the VIC-II interrupted it with an
 +raster IRQ. Thanks goes to Ninja/​The-Dreams (aka Wolfram Sang) for tips on
 +how to create a stable line-interruptible FLI routine using timers.
 +
 +    Source code
 +
 +Without further ado, here is the source code. This source code was used in the
 +Santa Claus FLI Demo for Wheels OS. This code will run in either Commodore 64
 +or 128 computers and in either PAL or NTSC systems. It did take a lot of
 +tweaking at Points #1, #2, #3, #4, & #5 as I tried to perfect the routines as
 +closely as possible. The full source code for the Santa Claus FLI demo can be
 +supplied via email upon request. It is in Concept+ (geoProgrammer) format.
 +
 +; Wedges the full-screen FLI interrupt handler in Wheels systems.
 +; Thanks goes to Robin Harbron for the idea of a line-interruptible FLI routine.
 +InstallFLI:​ ;​ Installs the FLI routine
 + jsr ClearMouseMode ;​ Turn off the mouse.
 + sei
 + lda CPU_DATA ;​ Save 6510/8510 Location #$01.
 + sta r6510
 + lda screenMode ;​ Check computer.
 + bne 2$ ; Take branch in 128 mode.
 + lda #IO_IN
 +.byte $2c
 +2$:​ lda #​%00110111 ;​ for 128 mode only
 + sta CPU_DATA
 + lda vmcsb ;​ Save original video bank info for Wheels.
 + sta oVMBmp
 + lda scroly ;​ save screen Y axis
 + sta yaxis
 + MoveW $fffe,​ oldVector ; Saves the old Wheels IRQ vector.
 + lda #<​fli ;​ Sets up fli raster.
 + sta $fffe ;​ At the IRQ interrupt vector.
 + lda #>​fli
 + sta $ffff
 +; Point #1
 + lda #$31 ; Trigger the IRQ request
 + sta raster ;​ At rasterline 49.
 + lda scroly
 + and #$7f ; Clear bit 7 of raster register.
 + sta scroly
 + lda #1
 + sta vicirq ;​ Ack raster ints.
 + cli
 + rts
 +
 +RemoveFLI:​ ;​ Removes the FLI routine
 + sei
 + MoveW oldVector,​ $fffe ; Restores the old Wheels IRQ vector.
 + lda #$fb ; Trigger the IRQ request
 + sta raster ;​ At rasterline 251.
 + lda yaxis ;​ restore screen Y axis
 + and #$7f ; Clear bit 7 of raster register.
 + sta scroly
 + lda #1
 + sta vicirq ;​ Ack raster ints.
 + lda oVMBmp ;​ Restore original video bank info for Wheels.
 + sta vmcsb
 + lda r6510
 + sta CPU_DATA ;​ Restores 6510 Port #$01
 + cli
 + jmp StartMouseMode ;​ Start the mouse on.
 +fli: ; The actual FLI interrupt routine lies here.
 + pha
 +.byte $da ;​phx
 +.byte $5a ;​phy
 + php ; Save processor flags.
 +; Point #2
 + ldx #$03 ; #$0f for PAL SuperCPU systems.
 +3$ dex
 + bpl 3$
 + lda raster
 + tax
 + ldy colOneClrs,​x ;​ Get colors for the first three columns.
 + sty mark4+1
 + ldy colTwoClrs,​x
 + sty mark5+1
 + ldy colThreeClrs,​x
 + sty mark6+1
 +;​ ldy backgndTable,​x ;​ Get background color for scanline.
 +; stx $d021
 + inx
 +; Point #3
 + cpx #$f9 ; Have we reached scanline 249?
 + bne 1$
 +; Point #1
 + ldx #$31 ; Restart the IRQ at rasterline 49.
 +1$:​ stx raster ;​ By this time, the raster interrupt register is
 + ; incremented by one, and will re-trigger the
 + ; same fli routine.
 + ; This way, it is fully line-interruptible
 + ; and frees up SuperCPU time.
 + ldy #$01
 + sty vicirq ;​ Ack raster ints.
 + and #$07 ; Mask out lower three bits.
 + tax
 + ldy tabd018,​x ;​ Use preset values for vmcsb.
 + lda d011tab,​x ;​ Use preset values for scroly.
 + sty vmcsb ;​ Select video matrix.
 + sta scroly ;​ Forces the badline.
 +mark4: lda #$00 ; Stores a video matrix value onto the first column.
 + sta $d022
 +mark5: lda #$00 ; Stores a video matrix value onto the second column.
 + sta $d022
 +mark6: lda #$00 ; Stores a video matrix value onto the third column.
 + sta $d022
 + plp ; Restore processor flags.
 +.byte $7a ;​ply
 +.byte $fa ;​plx
 + pla ; Do NOT use any memory accesses to the host
 + rti ; Commodore databus in this part because it will
 + ; be blocked by the VIC-II badline.
 +
 +; Point #4
 +tabd018:​ ;​ Preset video matrix values.
 +.byte $78,​$08,​$18,​$28,​$38,​$48,​$58,​$68;​ NTSC systems
 +;​.byte $08,​$18,​$28,​$38,​$48,​$58,​$68,​$78 - PAL systems
 +d011tab:​ ;​ Preset VIC DMA retrigger values.
 +.byte $38,​$39,​$3a,​$3b,​$3c,​$3d,​$3e,​$3f
 +
 +ChkAbortKey:​ ;​ Checks the RUN/STOP keypress.
 + LoadB $dc00,​ #$7f ; check for the STOP key
 +3$:​ asl $dc01 ;​ Check for the RUN/STOP keypress.
 + ; This also synchronizes the line-interruptible FLI routine.
 + bcs 3$ ; Branch if it isn't pressed.
 + rts
 +prep3Cols:​ ;​ Prepares the first three columns of the FLI screen
 + ; Ideally, a FLI file would be loaded in and this
 + ; routine would then be called to set up the three
 + ; 200-byte tables corresponding to each column,
 + ; covering the first three columns of the screen.
 + lda #$40
 + sta mark1+2 ;​ Prepare the marks.
 + sta mark2+2
 + sta mark3+2
 + ldy #$00
 + sty mark1+1
 + iny
 + sty mark2+1
 + iny
 + sty mark3+1
 + php
 + sei
 + lda screenMode
 + beq 1$ ; take branch in 64 mode.
 + lda $ff00
 + pha ; save 128 configuration.
 + lda #​%01111110 ;​ select RAM at $4000
 + sta $ff00
 +1$:​ ldy #​$00
 + ldx #$07 ; Use self-modifying code to create three
 +; Point #5
 +mark1:​ lda $4000 ;​ 200-byte tables for each column of the
 + sta colOneClrs+49,​y ;​ FLI screen and each value is indexed by
 +mark2:​ lda $4001 ;​ the scanline in the FLI routine.
 + sta colTwoClrs+49,​y
 +mark3:​ lda $4002
 + sta colThreeClrs+49,​y ;​ use +48 for the column offset in PAL
 + clc ; systems.
 + lda mark1+2
 + adc #$04
 + sta mark1+2
 + sta mark2+2
 + sta mark3+2
 + iny
 + dex
 + bpl mark1
 + sec
 + lda mark1+2
 + sbc #$20
 + sta mark1+2
 + clc
 + lda mark1+1
 + adc #$28
 + sta mark1+1
 + tax
 + inx
 + stx mark2+1
 + inx
 + stx mark3+1
 + lda mark1+2
 + adc #$00
 + sta mark1+2
 + sta mark2+2
 + sta mark3+2
 + cpy #200
 + bne mark1-2
 + lda screenMode
 + beq 2$ ; take branch in 64 mode.
 + pla
 + sta $ff00 ;​ restore 128 configuration.
 +2$: plp
 + rts
 +
 +.ramsect $1000
 +; All column colors are referenced by scanline.
 +colOneClrs:​ ;​ Column one colors of the FLI screen.
 +.block 256
 +colTwoClrs:​ ;​ Column two colors of the FLI screen.
 +.block 256
 +colThreeClrs:​ ;​ Column three colors of the FLI screen.
 +.block 256
 +
 +Hopefully the full-screen FLI possibilities that the SuperCPU can now unlock
 +will bring forth cool software for our SuperCPU'​s and tons of 'eye candy'​.
 +
 +Enjoy.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::​
 +
 + Main Articles
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::​
 +
 +VIC KERNAL Disassembly Project - Part V
 +Richard Cini
 +February, 2002
 +
 +Introduction
 +============
 +
 + In Part 4 of this series, we discovered that of the 39 ROM routines
 +available to be called by user code, 26 of them related to device I/O. The
 +path to using a device from machine code was to first "​open"​ it. So, we
 +looked at the routines required to open and begin using a logical device.
 +
 + When we concluded the discussion on the OPEN command, we left out
 +seven routines dealing with the tape device. This is some of the most
 +complex code I've ever seen. Unlike the IEEE serial peripherals,​ the tape
 +deck is a "​dumb"​ device, meaning that the VIC Kernal has to do the work of
 +moving the data to/from the tape. With the serial peripherals,​ the Kernal
 +just sends a character to the device that then acts on it independently of
 +the VIC.
 +
 + After opening the device, there are nine non-tape routines to deal
 +with getting data into and out of the VIC, including talking and untalking,
 +listening and unlistening,​ moving the characters in and out, and some
 +secondary address stuff. ​
 +
 + In this installment,​ we'll conclude our discussion of the OPEN
 +procedure by looking at the tape routines that are called from OPEN. Later
 +articles will discuss the actual movement of bits to and from the cassette
 +and the remaining IEEE serial routines, as well as the loading and saving of
 +files.
 +
 +Tape Routines
 +=============
 +
 + In the last installment,​ we examined the beginnings of opening a
 +device for use by a program. The reader will find that the routines are
 +convoluted and hard to follow, with jumps and branches into apparently
 +unrelated subroutines. Overall it appears to be ugly but highly functional
 +code.
 +
 + Here'​s a "​calling tree" outline to the routines called by OPEN:
 +
 +IOPEN----
 + |-FIND (analyzed in Part IV of this series)
 + |-SENDSA (ultimately handles IEEE serial stuff)
 + |-SEROPN (handles RS232 stuff)
 + |
 + |(all tape-related from here down)
 + |-GETBFA (get address of tape buffer)
 + |-PLAYMS (prompt user to press PLAY on tape deck)
 + |   ​|-TPSTAT (tape key status)
 + |   ​|-TPSTOP (check for STOP keyboard key during tape ops)
 + |
 + |-SRCHMS (print "​Searching"​ or "​Searching for..."​ messages)
 + |-LOSCPH (locate a tape header with filename matching one in
 + | OPEN)
 + |-LOCTPH (locate first/next header; no filename specified)
 + |   ​|-SETBST (sets tape buffer start/end pointers)
 + |   ​|-PLAYMS (see above)
 + |   ​|-TPCODE (main tape code-moves bits in and out on IRQ)
 + | |-SBIDLE (serial buss idle check)
 + | |-STOIRQ1 (put key tape vectors into table)
 + | |-NCHAR (sets bit counter for char input operations)
 + | |-TPSTOP (see above)
 + | |-IUDTIM (update jiffy clock; previous installment)
 + |
 + |-RECDMS (prompt user to press PLAY & RECORD on tape deck)
 + |-WRTPHD (write a tape header to tape)
 + |   ​|-SETSBT (see above)
 + |   ​|-TPWRIT1 (precedes TPCODE by 12 bytes)
 +
 +
 + When dealing with the tape, it helps to understand that Commodore
 +built the tape protocol with an eye towards readability under adverse
 +conditions, including tape quality and motor speed. This made Commodore
 +tapes probably one of the most reliable data systems when compared to TI,
 +Apple, and Tandy, among others. Data headers and data blocks are repeated on
 +the tape and a comparison is made between the two reads to ensure integrity.
 +Additionally,​ the recording method is self-clocking so the effects of
 +varying tape speed are minimized.
 +
 + One of the first routines used in opening the tape device is
 +determining the address of the tape buffer and making sure that it's in the
 +$02xx page (or higher) of the system RAM. A test at IOPEN_S5 bails out with
 +an "​illegal device"​ error if the tape buffer isn't just so.
 +
 +F84D ;​=============================================================
 +F84D ; GETBFA - Get start of tape buffer
 +F84D ; ​   Returns buffer address in .X (LSB) and .Y (MSB)
 +F84D ;
 +F84D GETBFA
 +F84D A6 B2 LDX TAPE1
 +F84F A4 B3 LDY TAPE1+1
 +F851 C0 02 CPY #$02 ;is buffer at $02xx?
 +F853 60 RTS
 +
 + PLAYMS is called by IOPEN at IOPEN_S6. A test there determines if
 +we're trying to read/load or write/save from/to a tape. Read mode results in
 +the "Press Play..."​ message, the "​Searching for..."​ message and two routines
 +that search for the appropriate tape header.
 +
 +F894 ;​===============================================================
 +
 +F894 ; PLAYMS - Wait for tape key on read
 +F894 ;
 +F894 PLAYMS
 +F894 20 AB F8    JSR TPSTAT ;get tape key status
 +F897 F0 1C        BEQ TPSTEX ;​$F8B5 pressed? yes, exit.
 +F899             
 +F899 A0 1B        LDY #​KIM_PLAY ;​offset for "Press Play..."​ message
 +
 +F89B PLAYMS1
 +F89B 20 E6 F1    JSR MSG ;print it
 +             
 +F89E WTPLAY
 +F89E 20 4B F9    JSR TPSTOP ; check for STOP key
 +F8A1 20 AB F8    JSR TPSTAT ;get key status
 +F8A4 D0 F8        BNE WTPLAY ;​$F89E wait for PLAY switch
 +             
 +F8A6 A0 6A        LDY #​KIM_OK ;​offset for "​OK"​ message
 +F8A8 4C E6 F1    JMP MSG ;print it and return to caller
 +
 + This simple routine prints the "​Searching for..."​ message only if in
 +direct mode, and if appropriate,​ the filename that is being searched for. If
 +no filename is present, the message is changed to "​Searching..."​ (without
 +the "​for"​) before it's output to the screen.
 +
 +F647 ;​==============================================================
 +F647 ; SRCHMS - Print "​Searching for [filename]"​
 +F647 ;
 +F647 SRCHMS
 +F647 A5 9D        LDA CMDMOD ;​direct mode?
 +F649 10 1E        BPL SRCHEX ;​$F669 no (programmed mode), exit
 +F64B             
 +F64B A0 0C        LDY #​KIM_SRCH ;"​Searching"​ message
 +F64D 20 E6 F1    JSR MSG ;​output message
 +F650 A5 B7        LDA FNMLEN ;get filename length
 +F652 F0 15        BEQ SRCHEX ;​$F669 no filename, skip "​for"​
 +F654 A0 17        LDY #$17 ;point to "​FOR"​ in "​Searching For"
 +F656 20 E6 F1    JSR MSG ;print it 
 +F659 ;
 +F659 ; FLNMMS - Print filename
 +F659 ;
 +F659 FLNMMS
 +F659 A4 B7        LDY FNMLEN ;get filename length
 +F65B F0 0C        BEQ SRCHEX ;​$F669 no filename, exit
 +F65D A0 00        LDY #$00
 +F65F FLNMLP ;​ loop to print filename
 +F65F B1 BB        LDA (FNPTR),​Y ;​get character
 +F661 20 D2 FF    JSR CHROUT ; and print it
 +F664 C8          INY
 +F665 C4 B7        CPY FNMLEN
 +F667 D0 F6        BNE FLNMLP ;​$F65F loop
 +F669 SRCHEX
 +F669 60          RTS ;exit
 +
 +
 + If the user is attempting to open a tape file with a specific
 +filename, the IOPEN code makes a call to LOCSPH in IOPEN_S6 to find the file
 +header associated with the filename. If the specific header is not found,
 +the routine emits the "File not Found" error message. LOCSPH is a loop
 +wrapper around the LOCTPH routine that searches for the "​next"​ file header
 +regardless of name. The secondary address parameter of the OPEN command
 +defines whether the action is to (0) read a tape file and relocate it in
 +memory, (1) read a tape file without relocation (a machine program), or (2)
 +write a tape file and put both EOF an EOT markers after it. Once a header is
 +found, the filename from the header is compared to the filename in the OPEN
 +command to see if there'​s a match.
 +
 +F867 ;​============================================================
 +F867 ; LOCSPH- Find specific tape header
 +F867 ;
 +F867 LOCSPH
 +F867 20 AF F7    JSR LOCTPH ;​search for next header
 +F86A B0 1D        BCS LCSPEXC+1 ;​$F889 returned EOT? Go to ready
 +             
 +F86C A0 05        LDY #​$05 ;​filename offset in header
 +F86E 84 9F        STY TPTR2 ;save offset ​
 +F870 A0 00        LDY #$00 ;loop counter
 +F872 84 9E        STY TPTR1 ;save it
 +             
 +F874 LCSPHLP
 +F874 C4 B7        CPY FNMLEN ;​compare filename length
 +F876 F0 10        BEQ LCSPEXC ;​$F888 counter 0, exit
 +F878             
 +F878 B1 BB        LDA (FNPTR),​Y ;​get filename letter
 +F87A A4 9F        LDY TPTR2 ; offset to name in header
 +F87C D1 B2        CMP (TAPE1),​Y ;​compare to tape header
 +F87E D0 E7        BNE LOCSPH ;​f867 different, get next header
 +
 +F880 E6 9E        INC TPTR1 ;​increment counters
 +F882 E6 9F        INC TPTR2
 +F884 A4 9E        LDY TPTR1
 +F886 D0 EC        BNE LCSPHLP ;​$F874 compare next character
 +F888 LCSPEXC
 +F888 18          CLC ; exit success
 +F889 60          RTS
 +
 +
 + Tape searches are performed linearly, so the LOCTPH routine is used
 +to search for the next file header on the tape starting from the current
 +tape position. This routine is also called by LOAD through IOPEN if no
 +filename is provided as a parameter to the LOAD (or OPEN) call.
 +Additionally,​ it's called in a loop by the LOCSPH routine when searching for
 +a specific file.
 +
 + LOCTPH reads a tape block to the tape buffer using TPREAD and
 +examines a few important fields in the file header. The fields examined
 +indicate the file type and the filename. If the file header indicates that
 +the file is anything other than a program file or a data file, the routine
 +looks for another file header until it reaches the end of the tape. If BASIC
 +is in "​direct mode", LOCTPH prints the "​Found"​ message in addition to the
 +filename.
 +
 +
 +F7AF ;​============================================================
 +F7AF ; LOCTPH - Read any tape header
 +F7AF ; ​ Header type: 1=BASIC program, 2=data block, 3=machine program,
 +F7AF ;​ 4=data header, 5=end-of-tape marker
 +F7AF ;
 +F7AF LOCTPH
 +F7AF A5 93        LDA IOFLG2
 +F7B1 48          PHA ;save load/verify flag
 +F7B2 20 C0 F8    JSR TPREAD ;​read tape block to buffer
 +F7B5 68          PLA
 +F7B6 85 93        STA IOFLG2 ;​restore flag
 +F7B8 B0 2C        BCS LOCTPEX ;​F7E6 error, end search
 +
 +F7BA A0 00        LDY #$00 ;index reg
 +F7BC B1 B2        LDA (TAPE1),​Y ;​get header type
 +F7BE C9 05        CMP #​$05 ;​EOT? ​
 +F7C0 F0 24        BEQ LOCTPEX ;​$F7E6 yes, exit
 +
 +F7C2 C9 01        CMP #$01 ;BASIC program?
 +F7C4 F0 08        BEQ LOCTP1 ;​$F7CE yes, branch
 +
 +F7C6 C9 03        CMP #$03 ;ML program?
 +F7C8 F0 04        BEQ LOCTP1 ;​$F7CE yes, branch
 +
 +F7CA C9 04        CMP #$04 ;data header?
 +F7CC D0 E1        BNE LOCTPH ;​must be data block-skip it
 +
 +F7CE LOCTP1 ;​program or data header comes here
 +F7CE AA          TAX
 +F7CF 24 9D        BIT CMDMOD ;​direct mode?
 +F7D1 10 11        BPL LOCTPEX-2 ;​$F7E4 no, continue
 +
 +F7D3 A0 63        LDY #​KIM_FOUN ;​setup for "​Found"​ msg
 +F7D5 20 E6 F1    JSR MSG ;print it
 +F7D8 A0 05        LDY #​$05 ;​offset to file name
 +
 +F7DA LOCLOOP ;​ loop to print filename
 +F7DA B1 B2        LDA (TAPE1),​Y ;​print loop
 +F7DC 20 D2 FF    JSR CHROUT
 +F7DF C8          INY
 +F7E0 C0 15        CPY #$15 ;21 characters max
 +F7E2 D0 F6        BNE LOCLOOP ;​$F7DA loop
 +F7E4 18          CLC
 +F7E5 88          DEY
 +
 +F7E6 LOCTPEX
 +F7E6 60          RTS
 +
 +
 + The TPREAD routine is responsible for the setup required to read or
 +verify a block from the tape. It prompts the user to press the PLAY button
 +on the tape deck, disables system interrupts and sets some important
 +parameters before execution falls through to the code responsible for
 +starting the tape IRQ routines. In many Commodore machines, tape operations
 +are performed under the operation of a routine installed as a temporary IRQ
 +handler -- sort of a cheap multitasking so that the system doesn'​t appear to
 +be hung while tape operations are occurring. Execution ultimately comes to
 +code at $F8EF (TPCODE) which is responsible for installing and starting the
 +tape IRQ routine.
 +
 + All of this and we haven'​t yet reached the bits on the tape :-)
 +
 +F8C0 ;​==========================================================
 +F8C0 ; TPREAD - Read tape block
 +F8C0 ;
 +F8C0 TPREAD
 +F8C0 A9 00        LDA #$00
 +F8C2 85 90        STA CSTAT ;​clear status variable...
 +F8C4 85 93        STA IOFLG2 ;and read/verif flag
 +
 +F8C6 TPREAD1
 +F8C6 20 54 F8    JSR SETBST ;set tape buffer pointers
 +
 +F8C9 ;
 +F8C9 ; load program
 +F8C9 ;
 +F8C9 TPREAD2
 +F8C9 20 94 F8    JSR PLAYMS ;​wait for Play key
 +F8CC B0 1F        BCS TPCODE-2 ;​$F8ED (in TPWRIT1)
 +F8CE             
 +F8CE 78          SEI ;​disable interrupts
 +F8CF A9 00        LDA #$00 ;clear work memory for IRQ routines
 +F8D1 85 AA        STA RIDATA
 +F8D3 85 B4        STA BITTS
 +F8D5 85 B0        STA TPCON1
 +F8D7 85 9E        STA TPTR1
 +F8D9 85 9F        STA TPTR2
 +F8DB 85 9C        STA BYTINF
 +F8DD A9 82        LDA #$82 ;Timer H constant
 +F8DF A2 0E        LDX #​$0E ;​number for IRQ vector
 +F8E1 D0 11        BNE TPCODE1 ;​$F8F4 (TPCODE1 in TPWRIT below)
 + ; falls through to TPWRIT below)
 +
 +
 + SRCHMS also calls this small routine to determine if a key is
 +pressed on the tape deck. TPSTAT looks at PA6 on VIA1 to determine the key
 +state and sets up the Z flag for a compare to be performed in SRCHMS. ​
 +
 +F8AB ;​============================================================
 +F8AB ; TPSTAT - Check tape key status
 +F8AB ;
 +F8AB TPSTAT
 +F8AB A9 40        LDA #​%01000000 ;​$40
 +F8AD 2C 1F 91    BIT D1ORAH ;​switch sense
 +F8B0 D0 03        BNE TPSTEX ;​$F8B5 not pressed, exit
 +F8B2 2C 1F 91    BIT D1ORAH ;​button pressed. ​ Setup for another
 +F8B5             ​ ;​compare later. Z=1 if pressed
 +F8B5 TPSTEX
 +F8B5 18          CLC
 +F8B6 60          RTS ;​return clear
 +
 +
 + One of the tests performed in IOPEN (also at IOPEN_S6) is to
 +determine if the tape operation is a read or a write. If we're in the write
 +mode, the code jumps to IOPEN2. At IOPEN2, the Kernal prompts for the user
 +to press play and record on the tape deck and then writes a file header by
 +calling WRTPHD with a control ID of $04 (a data header). Then, IOPEN writes
 +a control byte ID of $02 (block is a data block) to the tape buffer and
 +returns.
 +
 + RECDMS is called by IOPEN to determine if a key is pressed on the
 +tape deck and if not, sets the message flag to the "Press Play & Record"​
 +message and prints the message by calling into the PLAYMS routine. PLAYMS
 +prints the message and then checks for the key press.
 +
 +F8B7 ;​===========================================================
 +F8B7 ; RECDMS - Wait for tape key on write
 +F8B7  ;
 +F8B7 RECDMS
 +F8B7 20 AB F8    JSR TPSTAT ;get button status
 +F8BA F0 F9        BEQ TPSTEX ;​$F8B5 pressed? Yes, continue
 +
 +F8BC A0 2E        LDY #​KIM_RECP ;​no,​ remind to "Press Play & Record"​
 +F8BE D0 DB        BNE PLAYMS1 ;​exit through $F89B 
 +
 +
 + IOPEN calls WRTPHD at IOPEN2 with $04 as the control byte (following
 +block is a data header) to be written into the header. WRTPHD then writes
 +some critical information into zero-page locations in advance of filling the
 +tape buffer with the same information. At the end of the routine, the data
 +is written to the tape in WRTPH1. ​
 +
 +F7E7 ;​==========================================================
 +F7E7 ; WRTPHD - Write tape header
 +F7E7 ; ​ On entry, .A is the header type: 2=data blk; 4=data hdr
 +F7E7 ;
 +F7E7 WRTPHD
 +F7E7 85 9E        STA TPTR1 ;save header type
 +F7E9 20 4D F8    JSR GETBFA ;get tape buffer address
 +F7EC 90 5E        BCC WRTPEX ;​$F84C exit if not right
 +
 +F7EE A5 C2        LDA STAL+1 ; save some program info
 +F7F0 48          PHA ;​save...start H
 +F7F1 A5 C1        LDA STAL
 +F7F3 48          PHA ;​...start L
 +F7F4 A5 AF        LDA EAL+1
 +F7F6 48          PHA ;​...end H
 +F7F7 A5 AE        LDA EAL
 +F7F9 48          PHA ;​...end L
 +F7FA A0 BF        LDY #​$BF ;​buffer length-1 (191)
 +F7FC A9 20        LDA #$20 ; {space}
 +
 +F7FE WRTPLP1 ;​ write program data to tape buffer
 +
 +F7FE 91 B2        STA (TAPE1),​Y ;​clear buffer
 +F800 88          DEY
 +F801 D0 FB        BNE WRTPLP1 ;​$F7FE
 +
 +F803 A5 9E        LDA TPTR1 ;get header type
 +F805 91 B2        STA (TAPE1),​Y ;​write it
 +F807 C8          INY
 +F808 A5 C1        LDA STAL ;get start L
 +F80A 91 B2        STA (TAPE1),​Y ;​write it
 +F80C C8          INY
 +F80D A5 C2        LDA STAL+1 ;get start H
 +F80F 91 B2        STA (TAPE1),​Y ;​write it
 +F811 C8          INY
 +F812 A5 AE        LDA EAL ;get end L
 +F814 91 B2        STA (TAPE1),​Y ;​write it
 +F816 C8          INY
 +F817 A5 AF        LDA EAL+1 ;get end H
 +F819 91 B2        STA (TAPE1),​Y ;​write it
 +F81B C8          INY
 +F81C 84 9F        STY TPTR2 ;save buffer offset
 +F81E A0 00        LDY #​$00 ;​filename loop counter
 +F820 84 9E        STY TPTR1 ;save loop counter
 +
 +F822 WRTPLP2 ;​ write filename to buffer
 +F822 A4 9E        LDY TPTR1 ;get loop counter
 +F824 C4 B7        CPY FNMLEN ;​compare filename length
 +F826 F0 0C        BEQ WRTPH1 ;​$F834 done
 +
 +F828 B1 BB        LDA (FNPTR),​Y ;​get filename char
 +F82A A4 9F        LDY TPTR2 ;get tape buffer pointer
 +F82C 91 B2        STA (TAPE1),​Y ;​write char to buffer
 +F82E E6 9E        INC TPTR1 ;​increment loop counters
 +F830 E6 9F        INC TPTR2
 +F832 D0 EE        BNE WRTPLP2 ;​$F822 loop
 +
 +F834 WRTPH1 ;​ flush buffer to tape
 +F834 20 54 F8    JSR SETBST ;set start and end address pointers
 +F837 A9 69        LDA #​$69 ;​checksum block ID
 +F839 85 AB        STA RIPRTY ;​save parity character
 +F83B 20 EA F8    JSR TPWRIT1 ;​$F8EA write block
 +F83E A8          TAY ;​restore start and end addresses
 +F83F 68          PLA
 +F840 85 AE        STA EAL
 +F842 68          PLA
 +F843 85 AF        STA EAL+1
 +F845 68          PLA
 +F846 85 C1        STA STAL
 +F848 68          PLA
 +F849 85 C2        STA STAL+1
 +F84B 98          TYA
 +F84C             
 +F84C WRTPEX
 +F84C 60          RTS ;exit
 +
 +
 + SETBST is a helper routine called by LOCTPH and WRTPHD to setup the
 +tape buffer before a tape operation takes place. It sets the starting
 +address of the buffer to the first address of the assigned buffer range and
 +sets the end of the buffer to start + 192 bytes.
 +
 +F854 ;​==========================================================
 +F854 ; SETBST - Set buffer start/end pointers
 +F854 ;
 +F854 SETBST
 +F854 20 4D F8    JSR GETBFA ;get buffer address
 +F857 8A          TXA
 +F858 85 C1        STA STAL ;​start=start of buffer
 +F85A 18          CLC
 +F85B 69 C0        ADC #​$C0 ;​end=start+192
 +F85D 85 AE        STA EAL
 +F85F 98          TYA
 +F860 85 C2        STA STAL+1
 +F862 69 00        ADC #$00
 +F864 85 AF        STA EAL+1
 +F866 60          RTS
 +
 +
 + TPWRIT performs the nuts and bolts of moving cassette data in and
 +out of the VIC. The VIC moves tape data by using a series of interrupt
 +routines that are swapped into the IRQ vector as needed. The benefit here is
 +that the tape IRQ code is then executed 60 times per second, along with
 +normal processing, until the operation is complete, resulting in a cheap
 +form of multitasking.
 +
 +TPWRIT performs some setup chores before changing the IRQ vector, including
 +clearing interrupts, ensuring that the IEEE serial bus is idle, saving the
 +old vector, assigning the new vector and setting-up the variables used by
 +the tape IRQ routine to actually move bits. Finally, interrupts are enabled
 +at $F92E which starts the whole process. When the tape IRQ routine is
 +finished it restores the IRQ vector to the standard $EABF which is detected
 +by TPWRIT at TPCDLP2 (an I/O loop). When completed, TPWRIT exits through the
 +TPSTOP routine. This loop also updates the jiffy clock.
 +
 +F8E3 ;​==========================================================
 +F8E3 ; TPWRIT - Initiate tape write
 +F8E3 ;
 +F8E3 TPWRIT
 +F8E3 20 54 F8    JSR SETBST ;get buffer pointers
 +F8E6 A9 14        LDA #​$14 ;​checksum
 +F8E8 85 AB        STA RIPRTY ;​save it
 +
 +F8EA ;
 +F8EA ; write buffer to tape
 +F8EA ;
 +F8EA TPWRIT1
 +F8EA 20 B7 F8    JSR RECDMS ;​wait for Record+Play keys
 +F8ED B0 68        BCS TPSTPX1 ;​$F957 exit
 +F8EF ;
 +F8EF ; TPCODE - Common tape code
 +F8EF ;
 +F8EF TPCODE
 +F8EF 78          SEI ;​disable interrupts
 +F8F0 A9 A0        LDA #​%10100000 ;​$A0 Timer H constant
 +F8F2 A2 08        LDX #​%00001000 ;​$08 offset for tape IRQ vector
 +
 +F8F4 TPCODE1
 +F8F4 A0 7F        LDY #​%01111111 ;​$7F disable interrupts
 +F8F6 8C 2E 91    STY D2IER ;save to interrupt enable reg
 +F8F9 8D 2E 91    STA D2IER ; ​ on VIAs  ​
 +F8FC 20 60 F1    JSR SBIDLE ;​wait for timeout
 +F8FF AD 14 03    LDA IRQVP ;save current IRQ Vector
 +F902 8D 9F 02    STA TAPIRQ
 +F905 AD 15 03    LDA IRQVP+1
 +F908 8D A0 02    STA TAPIRQ+1
 +F90B 20 FB FC    JSR STOIRQ1 ;​$FCFB .X=8 set tape IRQ vectors
 +F90E A9 02        LDA #$02 ;read # of blocks
 +F910 85 BE        STA FSBLK
 +F912 20 DB FB    JSR NCHAR ;set bit counter for serial shifts
 +F915 AD 1C 91    LDA D1PCR
 +F918 29 FD        AND #​%11111101 ;​$FD CA2 manual low
 +F91A 09 0C        ORA #​%00001100 ;​$0C force bits 2,3 high
 +F91C 8D 1C 91    STA D1PCR ;​switch on tape drive
 +F91F 85 C0        STA CAS1 ;flag as on
 +F921 A2 FF        LDX #$FF ;delay loop for high (outer)
 +F923 A0 FF        LDY #$FF ;inner loop
 +
 +F925 TPCDLP1
 +F925 88          DEY
 +F926 D0 FD        BNE TPCDLP1 ;​$F925
 +
 +F928 CA          DEX
 +F929 D0 F8        BNE TPCDLP1-2 ;​$F923 outside loop
 +F92B 8D 29 91    STA D2TM2H
 +F92E 58          CLI ;​allow tape interrupts
 +F92F ;
 +F92F ; wait for I/O-end
 +F92F ;
 +F92F TPCDLP2
 +F92F AD A0 02    LDA TAPIRQ+1 ;​check IRQ direction
 +F932 CD 15 03    CMP IRQVP+1 ;​standard vector?
 +F935 18          CLC
 +F936 F0 1F        BEQ TPSTPX1 ;​$F957 yes, ready
 +
 +F938 20 4B F9    JSR TPSTOP ;no, check STOP key
 +F93B AD 2D 91    LDA D2IFR ;​timer 1 IF clear?
 +F93E 29 40        AND #​%01000000 ;​$40
 +F940 F0 ED        BEQ TPCDLP2 ;​$F92F continue
 +
 +F942 AD 14 91    LDA D1TM1L ; get timer 1 loword
 +F945 20 34 F7    JSR IUDTIM ;​update RTC
 +F948 4C 2F F9    JMP TPCDLP2 ;​$F92F loop
 +
 +
 + TPSTOP is another helper routine. It scans for the keyboard STOP
 +key, and if detected, restores the standard IRQ vector and returns to the
 +caller'​s caller (the caller to TPWRIT).
 +
 +F94B ;​===========================================================
 +F94B ; TPSTOP - Check for tape stop
 +F94B ;
 +F94B TPSTOP
 +F94B 20 E1 FF    JSR STOP ;scan STOP key
 +F94E 18          CLC
 +F94F D0 0B        BNE TPSTPX ;​$F95C not pressed, return
 +
 +F951 20 CF FC    JSR RESIRQ ;​pressed,​ turn drive off and restore IRQ
 +F954 38          SEC ;set error flag 
 +F955 68          PLA ;pop return address
 +F956 68          PLA
 +
 +F957 TPSTPX1
 +F957 A9 00        LDA #$00 ;flag normal IRQ vector
 +F959 8D A0 02    STA TAPIRQ+1
 +
 +F95C TPSTPX
 +F95C 60          RTS ; return to caller'​s caller
 +
 +
 + The SBIDLE routine is used in TPWRIT to detect if the RS-232 serial
 +bus is active and if so, will wait until it's idle before returning to
 +continue the tape code.
 +
 +F160 ;​===========================================================
 +F160 ; SBIDLE - Set timer for serial bus timeout
 +F160 ;
 +F160 SBIDLE
 +F160 48          PHA ;save .A
 +F161 AD 1E 91    LDA D1IER ;get IER
 +F164 F0 0C        BEQ SBIDLEX ;​$F172 no interrupts pending, exit
 +
 +F166 SBIDLLP
 +F166 AD 1E 91    LDA D1IER ;get IER
 +F169 29 60        AND #​%01100000 ;​$60 T1 & T2
 +F16B D0 F9        BNE SBIDLLP ;​$F166
 +F16D             
 +F16D A9 10        LDA #​%00010000 ;​$10 kill CB1 RS232
 +F16F 8D 1E 91    STA D1IER
 +
 +F172 SBIDLEX
 +F172 68          PLA
 +F173 60          RTS
 +
 +
 + RATS3 is at the tail end of the RAMTAS routine - the subroutine that
 +precedes the tape vectors.
 +
 +FCF6 ;​===========================================================
 +FCF6 ; STOIRQ - Set IRQ vector
 +FCF6 ; usually called with .x=$08 or $0e. $08 points to the first
 +FCF6 ;​ entry in TAPVEC while $0e points to the last entry
 +FCF6 STOIRQ
 +FCF6 20 CF FC    JSR RESIRQ ;​restore std IRQs
 +FCF9 F0 97        BEQ TPEOI ;​$FC92
 +
 +FCFB STOIRQ1
 +FCFB BD E9 FD    LDA RATS3,​X ;​$FDE9,​X set vectors from table
 +FCFE 8D 14 03    STA IRQVP
 +FD01 BD EA FD    LDA RATS3+1,​X ;​$FDEA,​X
 +FD04 8D 15 03    STA IRQVP+1
 +FD07 60          RTS
 +
 + These are the actual routines responsible for writing bits to the
 +tape. Various calling routines place these vectors into the IRQ vector that
 +then gets executed 60 times per second. In order, these routines are for
 +writing a leader block to the tape, the routine to write data to the tape,
 +the standard IRQ vector, and the routine to read bits from the tape.
 +
 +FDF1 ;​===========================================================
 +FDF1 ; TAPEVC - Tape IRQ vectors
 +FDF1 ;
 +FDF1 TAPEVC
 +FDF1 ; .dw $FCA8, $FC0B, $EABF, $F98E
 +FDF1 A8FC0BFCBFEA .dw WRLDR2, TWRD7, IRQVEC, RDTPBT
 +FDF7 8EF9
 +
 +
 + The NCHAR routine resets the internal bit counters to their initial
 +state before a calling routine begins to shift the bits over a serial or
 +tape channel. It sets the bit length to 8 and resets intermediate variables
 +to 0.
 +
 +FBDB ;​===========================================================
 +FBDB ; NCHAR - Set bit counter for new character (serial output)
 +FBDB ;
 +FBDB NCHAR
 +FBDB A9 08        LDA #$08
 +FBDD 85 A3        STA SBITCF
 +FBDF A9 00        LDA #$00
 +FBE1 85 A4        STA CYCLE
 +FBE3 85 A8        STA BITCI
 +FBE5 85 9B        STA TPRTY
 +FBE7 85 A9        STA RINONE
 +FBE9 60          RTS
 +
 +
 + That'​s all of the room we have for this part of the series. All of
 +this code and we still haven'​t moved the bits off the tape. So, next time
 +we'll look at the actual routines responsible for moving bits on and off of
 +the tape and we'll begin to look at the IEEE serial routines.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::​
 +
 +     -----------------------
 +     The Art Of The Minigame
 +     -----------------------
 +
 +Introduction
 +------------
 +
 +In summer 2001, an 8-bit minigame contest was held.  Voter turnout may have
 +been low, but author turnout was high, with a total of thirty entries for
 +the C64, Spectrum, Amstrad, and Atari 8-bit. ​ The entries and the results
 +are available at http://​demo.raww.net/​minigame/,​ and what follows are a
 +series of articles by several minigame authors. ​ The articles are for
 +enjoyment, to stimulate thought, and, hopefully, to motivate people for
 +next year's contest. ​ (Everyone had a great time, btw.)
 +
 +Minigames -- writing tiny programs in general -- present a set of very
 +unique challenges. ​ Whereas many of us are used to optimizing programs for
 +cycle-efficiency,​ optimizing programs for byte-efficiency turns out to be
 +very different -- challenging,​ at times aggrivating,​ but very rewarding. ​ I
 +think you will find the challenges and solutions ahead to be very interesting.
 +
 +The game authors were pioneers, exploring pretty new territory, and I
 +salute all that entered the contest (especially for making such a nice C64
 +showing). ​ A special mention should be made of MagerValp ("​Skinny Puppy" in
 +Swedish, in case you've been wondering), who motivated a number of lazy
 +people (e.g. myself) to get involved with the contest.
 +
 +Before diving in to the articles ahead, which contain lots of tricks to
 +save memory, I thought it might get us in the mood to go over some general
 +ideas and concerns for saving bytes on a C64.
 +
 +It should be obvious that some balance has to be found between cycle-
 +efficiency and memory-efficiency. ​ Routines should be reasonably fast
 +in most cases, but the major portion of that balance is -- has to be --
 +memory efficiency.
 +
 +Obviously, everything that can be put into zero page should be put there,
 +since zero page instructions are two bytes instead of three. ​ Equally obvious
 +is that every list-type fetch should use the .X register, since there is
 +no zero-page lda xx,y instruction.
 +
 +Moreover, memory is initialized to various values when the computer is
 +reset. ​ Many zero-page locations have specific values -- by careful code
 +design, these can be taken advantage of to avoid any initialization code.
 +For example, many times the code needs a zero-page pointer with the low
 +byte zero (i.e. sta (point),​y). ​ If that variable is chosen as a location
 +which is already zero, the code doesn'​t have to waste four bytes on
 +lda #00 sta point. ​ One-time counters can also be used in this way.
 +Finally, certain locations can be manipulated to have certain values;
 +for example, the Tetrattack load address forces the end address+1 to
 +be $10f0, because $10 and $f0 are the z-coordinates of two object vertices.
 +
 +There is also a fairly substantial kernal to take advantage of -- routines
 +to clear the screen, perform memory manipulations,​ and so on.  Knowing what
 +is available, and what the routines do, is handy.
 +
 +Self-modifying code comes in handy when large portions of code can be
 +used for similar things with just a few changes. ​ For example, the line
 +routine in Tetrattack uses a single routine for lines in both the x- and
 +y-directions,​ by just interchanging a few INX/INY instructions.
 +
 +And finally, there are things which Mark Seelye terms "​injections",​ which
 +seems like a good term to me.  The idea is to reuse known register values
 +whenever possible. ​ That is, instead of having LDA #00 STA zp in a subroutine,
 +you might put an "STX zp" in an earlier subroutine, where you know the value
 +of .X is zero.  In other words, the instruction to clear the zp location is
 +"​injected"​ into a different subroutine to save a couple of bytes. ​ Every
 +program below uses this trick in one form or another.
 +
 +You'll see lots of neat tricks in the articles ahead, but the basic design
 +framework is: put stuff in zero-page, take advantage of default zero-page
 +values, take advantage of the kernal, always know the register values, and
 +reuse as much code as possible.
 +
 +With that, let's check out the programs.
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 1===================================
 +
 +Postmortem: ​ Codebreaker
 +
 +by White Flame (aka David Holz)
 +http://​white-flame.com
 +
 +The most difficult part of making an entry for this compo was deciding what
 +game to do that could fit in 512 bytes. ​ The ubiquitous snake clones and
 +scroll/​dodge games abound; finding something different is truly a challenge.
 +I wanted to do a Mastermind game a looong time ago, with a full-on "​We'​re
 +getting attacked, solve to code to save the world!"​ type of setting with lots
 +of animation, but usually found myself in application and library/​utility
 +programming instead. ​ But luckily this idea popped back into my head as
 +something different that would fit in the 2-page limit.
 +
 +The concept is pretty simple (as most 512-byte games are).  There'​s a 4-color
 +secret key, you get 10 guesses at it.  Two numbers reflect the score for the
 +current guess: a black number shows how many slots have the right color, and a
 +white number shows how many correct colors are in the guess, but in the wrong
 +slot.  Experienced players should be able to deduce the correct answer in 4-6
 +guesses.
 +
 +
 +Random number generation
 +
 +There are 3 obvious ways to get random numbers easily in the 64: SID
 +oscillator with noise waveform, BASIC prng, or the raster counter. ​ I decided
 +to go for the raster solution. ​ Obviously, you need to wait a certain amount
 +of time between reads of the raster location, or they'd be the same or linear.
 +To fix this, every time I wanted a random number, I'd read the raster and call
 +a loop that many times. ​ When it was done, $d012 should have a "​random enough"​
 +number in its lower 3 bits (to get a color from 0 to 5).
 +
 +randomDelay:​
 + ldx $d012
 +randomLoop:
 + jsr delay
 + dex
 + bne randomLoop;
 +delay:
 + dey
 + bne delay
 + rts
 +
 +
 +Code space saving
 +
 +What's interesting is that it actually took less bytes of code and data to
 +have custom draw loops for the 3 textual messages than setting a pointer and
 +calling a drawText function.
 +
 + ldy #22
 +titleloop:
 + lda titletext,y
 + sta screenLoc-48,​y
 + dey
 + bpl titleloop:
 +
 +These functions assume that you're not running on an ancient kernal, so color
 +memory is initialized to the cursor color (white).
 +
 +I also had all variables (except the secret key) use screen memory, color
 +memory, or register .Y instead of manipulating off-screen variables and then
 +updating the screen to match:
 +
 +- For entering a guess, color memory is INC'd and DEC'd, with .Y holding the
 +  current offset from the fixed screen location. ​ JSR $e8ea is used to scroll
 +  the screen, and everything is drawn to screen line 23.
 +
 +- To find out if you've already made 10 guesses, a memory location near the
 +  top of the screen was polled to see if it was still a space. ​ As guesses are
 +  made, the screen was scrolled, and if the title line ever scrolled into that
 +  memory location, it's game over.
 +
 +- The current guess score '0 0' is drawn to the screen first, then those
 +  screen values are INC'd or DEC'd as matches are found.
 +
 +- Winning constitutes checking to see if the screen has a '​4'​ in the black
 +  score location.
 +
 +
 +Score calculation
 +
 +This was a bit of a pain.  At first I trying to calculate the white score by
 +the scoring rules (every matching color that was in the wrong slot), INCing
 +the white score on every hit, then looping for exact matches, INCing the black
 +score on every hit.  This code ended up being way too big, so I got the idea
 +to simplify the white loop: INC the white score on every color match, whether
 +in the right spot or not, then on every black match, DEC the white score and
 +INC the black score. ​ This gave the same result, but with much simpler code.
 +
 +
 +Conclusion
 +
 +All in all, it was an interesting challenge for me, and had I not gotten stuck
 +in Phoenix for an extra week, I'd have finished my 2k entry. ​ The code is
 +pretty straightforward,​ and at 415 bytes total, I'm quite happy to claim
 +having the smallest entry, even if I didn't win (why Pacman didn't beat out a
 +low-res scroll/​dodge/​single-shot game is a mystery to me).  Being a hobbyist
 +C64 programmer, and my career history being more on the research end of
 +things, having a deadline is a nice change in terms of getting a project
 +completed. :)
 +
 +
 +Publisher: ​ Gasman/Raww Arse
 +Number of full-time developers: ​ 1 programmer
 +Number of contractors: ​ 0
 +Estimated budget: ​ 0 USD (0 EUR)
 +Development time:  About 5 hours
 +Release date:  August 22, 2001
 +Platforms: ​ Commodore 64 and compatibles
 +Development hardware: ​ pee sea (1.4GHz T-bird, 512MB RAM, 100GB HD, Win98SE)
 +3rd party tools: ​ xa, Vice, Notepad
 +In-house tools: ​ none
 +Project size: 290 lines of 6502 assembly code, containing 1 line of tokenized
 +BASIC
 +
 +--- cb.asm ---
 +.word $0801 ;Starting address for loader
 +
 +* = $0801
 +
 +.word nextLine ​ ;Line link
 +.word $0 ;Line number
 +.byte 158 ;SYS
 +.byte 50,48,54,54 ;"​2066"​
 +.byte 0
 +nextLine .byte 0,0 ;end of basic
 +
 +screenGameOver = $0400+(4*40)+16
 +screenLoc = $0400+(23*40)+16
 +colorLoc = $d800+(23*40)+16
 +screen = $fd
 +color = $fb
 +peg = 81
 +cursor = 87
 +crsr_up = 145
 +crsr_down = 17
 +crsr_left = 157
 +crsr_right = 29
 +
 +WaitStart:
 +jsr $ffe4
 +beq WaitStart
 +
 +Start:
 +
 +;Change to white
 +lda #$05
 +jsr $ffd2
 +
 +;Clear screen
 +lda #147
 +jsr $ffd2
 +
 +;Set bg colors
 +lda #$00
 +sta $d020
 +sta colorLoc
 +sta colorLoc+1
 +sta colorLoc+2
 +sta colorLoc+3
 +sta $ff ;for cursor
 +lda #11 ;dark gray
 +sta $d021
 +
 +;Init zp pointers
 +lda #<​screenLoc
 +sta screen
 +sta color
 +lda #>​screenLoc
 +sta screen+1
 +lda #>​colorLoc
 +sta color+1
 +
 +;Draw text
 +ldy #22
 +titleloop:
 + lda titletext,y
 + sta screenLoc-48,​y
 + dey
 + bpl titleloop:
 +
 +;Randomize values
 +
 +iny
 +newGuess
 +sty $02
 +guessAgain
 + jsr randomDelay
 + lda $d012
 + and #$07
 + cmp #$06
 + bpl guessAgain
 + ldy $02
 + ​checkLoop: ​ ;See if the color'​s already in the key
 + dey
 + bmi goodGuess
 +  cmp secret,y
 +  beq guessAgain
 + bne checkLoop
 +
 + ​goodGuess: ​ ;Save the color and put a peg on the screen to show progress
 + ldy $02
 + sta secret,y
 + lda #peg
 + sta (screen),y
 + iny
 + cpy #$04
 + bne newGuess
 +
 +Game_Loop:
 +
 +;Get cursor pos
 +ldy $ff
 +
 +;Draw cursor
 +lda #cursor
 +sta (screen),y
 +
 +Key_Loop:
 +
 +;Wait for key
 +getin: jsr $ffe4
 +beq getin
 +ldy $ff
 +
 +;If up, bump color up
 +cmp #crsr_up
 +bne notUp
 + lda (color),y
 + and #$07
 + adc #$00 ;carry is always set after the cmp
 + cmp #$06
 + bne colorNotHigh
 +  lda #$00
 + ​colorNotHigh:​
 + sta (color),y
 +notUp:
 +
 +;If down, bump color down
 +cmp #crsr_down
 +bne notDown
 + lda (color),y
 + and #$07
 + bne colorNotLow
 +  lda #$06
 + ​colorNotLow:​
 + sbc #$01 ;carry is always set after the cmp
 + sta (color),y
 +notDown:
 +
 +;If left,
 +cmp #crsr_left
 +bne notLeft
 + ; Erase cursor
 + lda #peg
 + sta (screen),y
 + ; Bump cursor left
 + dey
 + bpl cursorOKLeft
 +  ldy #$03
 + ​cursorOKLeft:​
 + ; Draw cursor
 + lda #cursor
 + sta (screen),y
 +notLeft:
 +
 +;If right,
 +cmp #crsr_right
 +bne notRight
 + ; Erase cursor
 + lda #peg
 + sta (screen),y
 + iny
 + cpy #$04
 + bne cursorOKRight
 +  ldy #$00
 + ​cursorOKRight:​
 + lda #cursor
 + sta (screen),y
 +notRight:
 +
 +;Store cursor pos
 +sty $ff
 +
 +;If not enter, goto Key_Loop
 +cmp #13
 +bne Key_Loop
 +
 +;​------------------------
 +
 +;Erase cursor
 +lda #peg
 +sta (screen),y
 +
 +;Check score
 +
 +;Draw "0 0"
 +lda #'0
 +sta screenLoc+5
 +sta screenLoc+7
 +lda #0 ;black
 +sta colorLoc+5
 +
 +;Calculate white score (any matches between the guess & key)
 +ldy #3
 +yLoop:
 + ldx #3
 + ​xLoop:​
 +  lda (color),y
 +  and #$07
 +  cmp secret,x
 +  bne noWhiteMatch
 +   inc screenLoc+7
 +  noWhiteMatch
 +  dex
 +  bpl xLoop
 + dey
 + bpl yLoop
 +
 +;Calculate black score (exact matches between the guess & key)
 +ldy #3
 +blackLoop:
 + lda (color),y
 + and #$07
 + cmp secret,y
 + bne noBlackMatch
 +  inc screenLoc+5
 +  dec screenLoc+7
 + ​noBlackMatch:​
 + dey
 + bpl blackLoop
 +
 +;Check win (black score = "​4"​)
 +lda #'4
 +cmp screenLoc+5
 +beq Win
 +
 +;Check for Game Over (screen'​s getting full)
 +lda screenGameOver
 +cmp #32
 +bne Game_Over
 +
 +;Scroll screen up 2 rows
 +jsr $e8ea
 +jsr $e8ea
 +
 +;Copy last hand
 +ldy #3
 +copyLoop:
 + lda colorLoc-80,​y
 + sta (color),y
 + lda #peg
 + sta (screen),y
 + dey
 + bpl copyLoop:
 +
 +jmp Game_Loop
 +
 +Game_Over:
 +jsr $e8ea
 +ldy #8  ;Show "YOU SUCK!"
 +loseloop:
 + lda losetext,y
 + sta (screen),y
 + dey
 + bpl loseloop:
 +
 +jsr $e8ea
 +ldy #3  ;Reveal the secret key
 +losecopy:
 + lda secret,y
 + sta (color),y
 + lda #peg
 + sta (screen),y
 + dey
 + bpl losecopy
 +jmp WaitStart
 +
 +Win:
 +jsr $e8ea
 +ldy #15  ;Show Zero Wing reference
 +winloop:
 + lda wintext,y
 + sta screenLoc-4,​y
 + dey
 + bpl winloop
 +rts
 +
 +randomDelay:​
 +ldx $d012  ;Loop n times, n = current raster location
 +randomLoop:
 + jsr delay
 + dex
 + bne randomLoop;
 +
 +delay:
 +dey
 +bne delay
 +rts
 +
 +wintext
 +.byte 1,​32,​23,​9,​14,​14,​5,​18,​32,​9,​19,​32,​25,​15,​21,​33
 +losetext
 +.byte 25,​15,​21,​32,​19,​21,​3,​11,​33
 +titletext .byte 58,​58,​58,​32,​3,​54,​52,​32,​3,​15,​4,​5,​2,​18,​5,​1,​11,​5,​18,​32,​58,​58,​58
 +secret = *
 +--- end cb.asm ---
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 2===================================
 +
 +Tinyplay by SLJ
 +--------
 +
 +Like many people, I was a serious Ultima guy growing up.  Not only did
 +I love the games (and still do), but I absolutely loved the music (and still
 +do).  In fact, I think the music from III and IV is some of the finest,
 +most musical composing ever done on the C64, and the best example of how
 +appropriate,​ atmospheric music can add volumes (so to speak) to a game.
 +
 +It also turns out that I had been thinking about some new ideas for a new
 +music system for a while. ​ So when Robin said he was writing an Ultima-like
 +game -- well!  I'm certainly no Kenneth W. Arnold, but a set of tiny tunes
 +sure sounded like a neat and fun thing to do, especially in the Ultima style!
 +Robin removed a lot of neat features and text to add in the music, so I feel
 +pretty honored to have been included.
 +
 +The player and tunes were written on pretty short notice; I think we started
 +a week or so before the deadline. ​ Moreover, I had to go out of town the
 +weekend of the deadline, getting back that Sunday, where many of the
 +optimizations took place in frantic coding sessions! ​ So it shouldn'​t be
 +surprising that the code is not as optimal as it could be -- if you get
 +around to looking at the source code, you'll see an awful lot of weird,
 +duplicate, and otherwise embarrasing lines of code which anyone with a clear
 +head would never have used.  But as is, the player and three tunes take up
 +some 428 bytes (I think), which isn't too bad.  (Later on I'll describe
 +a music compression routine which would have worked great, but that I didn't
 +have time to implement.)
 +
 +Broadly speaking, the player is a note-duration based player, using two
 +sawtooth voices for that Ultima sound (originally,​ before reality kicked in,
 +the thinking was to put some sound fx in the third voice). ​ It uses a single
 +play routine, using the .X register to tell which voice to play, and can
 +play multiple tunes. ​ The original version has several effects included like
 +major and minor arpeggios, but memory constraints forced these to be taken
 +out (along with the original tune, alas, alas).
 +
 +The full source code for the player is at the end of this article, and can be
 +divided into four primary parts: a one-time init routine, a routine to select
 +a tune, a routine to play notes, and the music data.  But perhaps the best
 +way to understand the player is to begin by looking at how the data for a
 +tune is stored.
 +
 +
 +Tune encoding
 +-------------
 +
 +Here is the third, "​castle"​ tune, in its entirety:
 +
 +* Tune 3: Castle
 +
 +t2v1     
 +         dfb 97           ;gate
 +         dfb 24+$80-1
 +         dfb 70,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 12+$80-1
 +         dfb 74,74,96,75
 +         dfb 24+$80-1
 +         dfb 77,​75,​74,​96,​72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 24+$80-1
 +         dfb 70,96,96,96
 +         dfb 00
 +t2v2     
 +         dfb 97
 +         dfb 24+$80-1
 +         dfb 34,41,46,41
 +         dfb 34,41,46,41
 +         dfb 58,62,53,57
 +         dfb 58,53,46,96
 +         dfb 0
 +
 +end      ​
 +
 +
 +
 +The first chunk of data (t2v1) is the voice 1 data, and the second chunk the
 +voice 2 data (t2v2).
 +
 +There are 12 notes in an octave, and eight octaves total, for 96 possible
 +notes. ​ In the player, data bytes 0-95 represent these notes -- the player
 +just uses the data as an index into a frequency table. ​ This leaves bytes 96
 +and above free for other uses; also, since the lower note values are rarely
 +used, values like 0, 1, etc. are also free in principle.
 +
 +Byte values >= 128 are used to specify the note duration. ​ When the player
 +encounters these byte values it simply strips the high bit and stores the
 +result. ​ When a new note is read in, this stored duration is placed into a
 +counter which is decremented on each player call.
 +
 +In the castle tune above, the duration for voice 2 is only set once, at the
 +very beginning -- every note has the same duration. ​ In the voice 1 data
 +it is set several times, to change the duration when appropriate.
 +
 +For some reason the code decrements the duration counter until it becomes
 +negative, instead of decrementing it until it becomes zero.  I'm sure it was
 +because of something relevant in an earlier version of the player, I just
 +totally don't remember what.  But that is why the durations above are encoded
 +as 24+$80-1 -- the -1 is to make it go negative, instead of to zero.  Looks
 +weird.
 +
 +Byte values >95 and <128 are available for '​special'​ features. ​ The player
 +supports one "​effect":​ the gate toggle. ​ The player can either leave the gate
 +bit on all the time or can turn it off right as a note is about to finish
 +(say, when the duration counter decrements down to 2).  A byte value of 97
 +turns the gate toggle on -- in the castle tune above it is the first byte
 +of data.  In other tunes, a value of 100 is used to turn off gate toggling.
 +(Historically speaking, values 98 and 99 were used for major and minor
 +arpeggios, and 100 was for no fx at all.)
 +
 +The byte value 96 is used as a "​rest"​ (or hold) -- the player resets the
 +duration counter but does not reset the gate bit or the note value. ​ When
 +there is no gate toggling this provides a way of holding a note for longer
 +than the basic duration. ​ When there is gate toggling, it starts the release
 +cycle, and lets notes fade away.  In the castle tune above, the first two
 +notes in voice 1 are "70 96" -- a note, then a rest, during which the note
 +fades away.
 +
 +Finally, the value 0 is used to indicate the end of data.  While 0 is
 +technically a note it never gets used as such.  When the player hits a zero
 +it simply resets the data pointer.
 +
 +So with all that in mind, let's take a look at the player.
 +
 +
 +The Player
 +----------
 +
 +Before playing a tune, a one-time routine is called which sets up the
 +frequency tables; when the player reads a new note, it looks up the SID
 +frequency settings in this table, along the lines of:
 +
 + ldy note,x
 + lda freqlo,y
 + sta $d400
 + lda freqhi,y
 + sta $d401
 +
 +Frequencies in different octaves are related by powers of 2.  When you go
 +up an octave, the frequency doubles -- for example, the frequency of C-5
 +is twice the frequency of C-4.  The routine starts with a table of twelve
 +frequency values for the highest octaves, and copies those values into
 +the end of the frequency table (e.g. freqlo+95, freqlo+94, ..., freqlo+84).
 +After copying, it divides each value by 2 and repeats the process -- now
 +the frequencies correspond to the next lower octave -- and this is done until
 +the frequency table is full.  Piece of cake.
 +
 +Once the frequency table is set up, the main play routine is called with
 +the tune number in .A.  If the tune number is different than the current
 +tune, the player simply resets the music data pointers and note durations,
 +and falls through to the main player routine.
 +
 +And the main player routine simply decrements the duration, and when it
 +becomes negative reads in the next data.
 +
 +Because the process -- decrement duration, read next data, store frequencies
 +in SID -- is the same for each voice, a single routine is used for both voices
 +one and two, using .X as the voice index. ​ That is, instead of
 +
 + dec duration
 +
 +the code can use
 +
 + dec duration,x
 +
 +and instead of
 +
 + sta $d400
 +
 +the code can use something like
 +
 + ldy SidOffset,x
 + sta $d400,x
 +
 +where .X is 0 or 1.
 +
 +And that's about all there is to it -- the code is pretty straightforward.
 +
 +
 +Compressing Music
 +-----------------
 +
 +Most music has the property that it doesn'​t jump around all over the place,
 +but rather notes progress in relatively small intervals (seconds, thirds,
 +fifths, etc.). ​ For example, a major chord (as used in a typical arpeggio) is
 +
 + note  note+4 ​ note+7 ​ note
 +
 +So the idea is to use a differential scheme, encoding the note _intervals_
 +instead of the note _values_. ​ That is, the above could instead be encoded as
 +
 + x  4  3  -7
 +
 +where each value is added to the current note value. ​ Thus, if intervals are
 +restricted to -8..7 then only four bits are required, cutting the data size
 +in half.  Well, not quite of course, because an escape code is needed to
 +specify notes outside of the -8..7 range, not to mention durations and fx
 +settings. ​ Luckily, not all interval values are used, so these can be used
 +for escape codes; code 96 -- the rest or "​hold"​ note -- can also use an
 +otherwise unused value.
 +
 +So the decompression code looks something like
 +
 + read next four bits
 + if escape code then read next byte
 + otherwise add to current note value
 +
 +Pretty simple, and leads to quite substantial savings. ​ Ah, but for a little
 +more time...
 +
 +Another possibility arises if not all note values are used.  I set up the
 +frequency tables to contain all 12 notes in eight octaves. ​ If not all twelve
 +notes are actually used -- if none of the tunes contain a D# or whatever --
 +then a few bytes can be shaved off of the frequency table. ​ But this means
 +a lot of renumbering and such... it all depends on just how much you want
 +those extra bytes!
 +
 +And that's about all there is to it.
 +
 +
 +Source code
 +-----------
 +
 +*      ​
 +* Tiny music player, for the
 +* minigame contest.
 +*
 +* slj 9/23/01
 +* sjudd@ffd2.com
 +*
 +
 +                          ;         org $2800-491
 +         org $2800-425
 +
 +ARPDEL ​  = 2
 +SR       = $c9
 +GATEDEL ​ = 2
 +
 +curtune ​ = $fe
 +curdur ​  = $fc
 +
 +tunepos ​ = $b0            ;position in note sequence
 +seqpos ​  = $b2            ;position in seq list
 +dur      = $b4            ;2 bytes each
 +notefx ​  = $b6
 +curnote ​ = $b8
 +newnote ​ = $ba
 +
 +arpdur ​  = $bc
 +arpoff ​  = $bd
 +
 +noteptr ​ = $fa
 +temp     = $fa
 +freqtab ​ = $c000
 +
 +
 +
 +
 +*
 +* Code begins here
 +*
 +start    ​
 +
 +*-------------------------------
 +         do 0
 +         jmp InitFreq
 +         jmp Play
 +         jmp Reset
 +         fin
 +*-------------------------------
 +
 +*
 +* InitFreq -- set up note table
 +*
 +InitFreq ​
 +         ldy #192
 +:l2      ldx #24
 +:loop    lda freq-1,x
 +         sta freqtab-1,y
 +         dex
 +         dey
 +         lda freq-1,x
 +         sta freqtab-1,y
 +         lsr freq,x
 +         ror freq-1,x
 +         dey
 +         beq :xit
 +         dex
 +         bne :loop
 +         beq :l2
 +:xit     rts
 +
 +*
 +* Play routine
 +* tune in .A
 +*
 +
 +Reset      ; Force a tune reset
 +         sec
 +         ror curtune
 +                          ; ldx #255
 +                          ; stx curtune
 +Play     
 +         ldx #1
 +         cmp curtune
 +         beq PlayTune
 +         sta curtune
 +
 +         ldy #00
 +         sty tunepos
 +                          ;         sty tunepos+1
 +         sty dur
 +         sty dur+1
 +
 +         lda #15
 +         sta $d418
 +         lda #SR
 +         sta $d406
 +         sta $d406+7
 +
 +
 +*
 +* Main routine
 +*
 +* voice in .X (0, 1)
 +*
 +
 +SetTune  ​
 +         ldy curtune
 +         txa
 +         beq :v1
 +:v2      lda v2tunepos,y
 +         bne :sta
 +:v1      lda v1tunepos,y
 +:sta     sta tunepos,x
 +
 +
 +PlayTune ​
 +
 +Get      ​
 +         lda #00
 +         sta newnote,x
 +         dec dur,x
 +         bpl DoFx
 +
 +:next    ldy tunepos,x
 +         inc tunepos,x
 +         lda tunes,y
 +         beq SetTune
 +         bpl :c1
 +         and #$7f         ;​$80+dur
 +         sta curdur,x
 +         bne :next
 +:c1      cmp #96
 +         bcc :setfreq
 +         beq :hold
 +         sta notefx,x
 +         bne :next
 +:note                     ;sta curnote,x
 +                          ;         sta newnote,x
 +                          ;         jsr setfreq
 +:​SetFreq ​
 +         asl
 +         tay
 +         lda Freqtab,y
 +         pha
 +         lda Freqtab+1,y
 +         ldy SidOffset,x
 +         sta $d401,y
 +         pla
 +         sta $d400,y
 +:gate    ​
 +         lda #$21
 +         ldy SidOffset,x
 +         sta $d404,y
 +
 +
 +:hold    lda curdur,x
 +         sta dur,x
 +
 +
 +
 +DoFx     
 +         ldy notefx,x
 +         cpy #97
 +         bne AllDone
 +
 +         ldy dur,x
 +         cpy #GATEDEL
 +         bcs :c1
 +         lda #$20
 +         ldy SidOffset,x
 +         sta $d404,y
 +:c1      ​
 +
 +AllDone  ​
 +         dex
 +         bpl PlayTune
 +RTS      rts
 +
 +
 +*
 +* Frequencies
 +*
 +freq     
 +         da 34334
 +         da 36377
 +         da 38539
 +         da 40831
 +         da 43258
 +         da 45831
 +         da 48557
 +         da 51444
 +         da 54502
 +         da 57743
 +         da 61177
 +         da 64815
 +
 +                          ;arptab dfb 0,4,7
 +
 +*
 +* Tune positions (offset into sequence
 +* list)
 +*
 +v1tunepos
 +         dfb t0v1-tunes
 +         dfb t1v1-tunes
 +         dfb t2v1-tunes
 +v2tunepos
 +         dfb t0v2-tunes
 +         dfb t1v2-tunes
 +         dfb t2v2-tunes
 +
 +
 +tunes    ​
 +         ​  ​ ; dfb 00           ;​dummy byte
 +*
 +* Sid offset table
 +*
 +SidOffset
 +         dfb 00
 +         dfb 07
 +
 +
 +t0v1     
 +         dfb 97
 +         dfb 10+$80-1 ​    ;​dur=10
 +         dfb 42,47,51
 +         dfb 42,47,51
 +         dfb 42,47,47,47
 +         dfb 51,51,51
 +         dfb 54,52,51
 +         dfb 52,51,49
 +         dfb 51,51,51
 +         dfb 49,49,49
 +         dfb 51,49,47
 +         dfb 49,47,46
 +         dfb 44,44,44
 +         dfb 46,46,46
 +         dfb 47,51,47
 +         dfb 46,42,46
 +         dfb 47,96,96
 +         dfb 96,96
 +         dfb 00
 +
 +t0v2                      ;v2
 +         dfb 97
 +         dfb 10+$80-1
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 35,39,96
 +         dfb 39,42,96
 +         dfb 42,30,96
 +         dfb 30,35,96
 +         dfb 35,34,96
 +         dfb 34,32,96
 +         dfb 32,32,96
 +         dfb 32,32,96
 +         dfb 32,34,96
 +         dfb 34,35,96
 +         dfb 35,30,96
 +         dfb 35,35,96
 +         dfb 35,35,96
 +         dfb 00
 +
 +* Tune 2: Combat
 +
 +t1v1     
 +         dfb 100          ;No gate
 +         dfb 11+$80-1 ​    ;​dur=11
 +         dfb 51,​54,​58,​59,​96
 +         dfb 58,​57,​54,​51,​54,​58,​63,​96
 +         dfb 62,​61,​60,​59,​56,​53
 +         dfb 58,​54,​51,​56,​53
 +         dfb 46,50,53,56
 +         dfb 59,58,57,54
 +         dfb 0
 +
 +t1v2                      ;v2
 +         dfb 100
 +         dfb 11+$80-1
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 15,27,15,39
 +         dfb 47,​44,​41,​51,​46,​42,​47,​44
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 22,34
 +         dfb 0
 +
 +* Tune 3: Castle
 +
 +t2v1     
 +         dfb 97           ;gate
 +         dfb 24+$80-1
 +         dfb 70,96,72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 12+$80-1
 +         dfb 74,74,96,75
 +         dfb 24+$80-1
 +         dfb 77,​75,​74,​96,​72
 +         dfb 6+$80-1
 +         dfb 96,96,70,72
 +         dfb 24+$80-1
 +         dfb 70,96,96,96
 +         dfb 00
 +t2v2     
 +         dfb 97
 +         dfb 24+$80-1
 +         dfb 34,41,46,41
 +         dfb 34,41,46,41
 +         dfb 58,62,53,57
 +         dfb 58,53,46,96
 +         dfb 0
 +
 +end      ​
 +
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 3===================================
 +
 +MagerTris -- by Per Olofsson <​magervalp@cling.gu.se>​
 +
 +Way back in 1990 or thereabouts I wrote a Tetris clone in basic on the
 +Amiga. When the compo was announced I scratched my head for a bit and
 +figured that I should have a good chance of fitting Tetris in 512
 +bytes. If I couldn'​t I would at least have a good engine for the 2K
 +compo.
 +
 +The Game
 +
 +This puzzle game was invented in the 80s by the Russian programmer
 +Alexey Pazhitnov. The game has a vertical field where one of seven
 +puzzle pieces appears at the top and falls towards the bottom. The
 +player can rotate the piece 90 degrees or move it left or right. When
 +it reaches the bottom the game field is checked for filled lines that
 +are removed. The player is awarded points for removing lines (with a
 +bonus for clearing more than one in one go) and the game is over when
 +the screen fills up so that new pieces can't enter the game field.
 +
 +Drawing the Screen
 +
 +The screen is filled with inverted space and the game pieces and the
 +border is drawn into the color ram. There aren't any real tricks used
 +here, the only one is that the color of the border is a side effect
 +of the last character printed, ascii 13 (light green).
 +
 +Random Numbers
 +
 +We need a decent source of random numbers to generate a random Tetris
 +piece. The SID chip's noise waveform is probably the best one
 +available on the C64, and fortunately the code to access it is really
 +short. To initialize I used:
 +
 +initrnd subroutine
 + lda #$81 ; enable noise
 + sta $d412 ; voice 3 control register
 + sta $d40f ; $81xx as the frequency
 +
 +and to get an 8-bit random number all you have to do is lda $d41b.
 +
 +The Tetris Pieces
 +
 +Tetris has 7 puzzle pieces that you have to store in a table. As every
 +piece consists of four boxes we need a total of 7*4 = 28 entries. The
 +smallest tetris table is probably the one where each piece is
 +represented by 8 bits, like this:
 +
 + 0010 0000 0011 0110 0100 0010 0110
 + 0111 1111 0110 0011 0111 1110 0110
 +
 +The table is very compact, a mere 7 bytes, but then you need code to
 +rotate every piece and accessing the table is somewhat clunky on the
 +6502. I've seen an x86 version that did this though, with a total size
 +of less then 256 bytes. However, on the 6502 I needed four bytes for
 +each piece for the code to be reasonably compact and I also kept all
 +the rotated pieces in the table, for a total of 112 bytes.
 +Fortunately,​ as every piece has a center box we only need to store 3
 +which trims the table to 84 bytes. But what do we actually store? As
 +the screen is painted to color ram I used a pointer to the current
 +position. In the table I simply stored the offset in color ram from
 +the center box. With indirect indexed addressing the routine is nice
 +and short.
 +
 +Failure
 +
 +The rest of the program is fairly straight forward. User input is just
 +a cmp loop, there are dec zp timers for most events, and a raster wait
 +to time everything. This is also where I failed to fit everything
 +inside the 512 byte limit, as everything put together took about 600
 +bytes, even after some serious optimizations. There are two very
 +similar routines: the one that draws a piece on the screen and the one
 +that checks for collisions. Both iterate through the four boxes in a
 +piece, but because of the way I used some zeropage pointers I couldn'​t
 +merge the two routines into one without rewriting most of the code.
 +
 +Success
 +
 +As rewriting everything meant too much work, I just decided to go for
 +plan B: write a 2K entry. With a 600 byte engine there was plenty of
 +space to add features, and at around 700 bytes or so compressíon
 +starts to make sense -- pucrunch breaks even around there somewhere,
 +and the final binary is actually 6500 bytes uncompressed. I could
 +easily fit a title screen complete with custom graphics, a nice tile
 +set for the tetris pieces, and even a hiscore list that saves to
 +disk. The only trick I used here was that the custom character set was
 +EOR:d with the ROM font, and as most characters are the same or very
 +similar it compresses much better that way.
 +
 +And that's the story of the 2K MagerTris.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 4===================================
 +
 +Tuff -- Compressing tiny programs
 +----------------------------------->​ Stephen L. Judd <​sjudd@ffd2.com>​
 +
 +Most compression schemes address the problem of taking a large program or
 +data file and compressing it down.  But have you given much thought to the
 +problem of trying to compress a _small_ program, even a 512-byte program?
 +
 +I spent some time trying to come up with a compression scheme for tiny
 +programs -- saving even 2 or 3 bytes in tetrattack would have been helpful.
 +The effort was basically a failure, as the best I thought I could do was to
 +break even (on paper, for tetrattack). ​ I never actually implemented the code
 +either -- just worked it out on paper.
 +
 +It is fairly interesting though, and maybe these preliminary efforts will
 +give other people ideas (especially for compressing 2k programs). ​ And after
 +writing this article, I now think it might have really worked, and saved
 +a handful of bytes -- maybe next year?
 +
 +Tiny Compression Basics
 +-----------------------
 +
 +If you remember your C=Hacking #16, there are two basic approaches to
 +compression:​ taking a fixed-length input while giving a variable-length
 +output, or taking a variable-length input and producing a fixed-length output.
 +An example of the latter is LZ compression -- looking for repeated
 +strings of length 2, 3, etc. and replacing those with a single byte (or two
 +bytes, or whatever). ​ An example of the former is a Huffman tree, which
 +replaces each byte with a string of bits, using fewer bits for the most
 +common bytes.
 +
 +To decompress the compressed data, of course, a decompression routine is
 +needed. ​ For tiny compression,​ this routine obviously needs to be as tiny
 +as possible. ​ Most C64 decompression programs copy themselves somewhere
 +before decompressing the data stream to wherever it is supposed to go.
 +Obviously, if we put a few restrictions on the data to be decompressed
 +the decompression routine can be made smaller, and more specific to the
 +task.
 +
 +Finally, it is worth mentioning that a tiny program in general does not
 +use all possible byte values.
 +
 +Now, first consider Huffman encoding. ​ In a typical Huffman decoding scheme,
 +the decoder reads a bit at a time, traversing the Huffman tree with each
 +bit, until a byte is hit.  This implies a fairly complicated decoder, and
 +worse the tree must be stored. ​ Finally, the tree can be fairly large, if
 +the number of possible symbols is large. ​ So Huffman doesn'​t seem like a
 +good approach.
 +
 +So, consider LZ-style encoding: replacing n-byte sequences with a shorter
 +code.  In LZ, the n-byte code is replaced with a reference to the previously
 +decompressed bytes -- a code which says how far back to go, and how many
 +bytes to copy.  What a drag -- an escape code, the distance backwards, and
 +the number of bytes to copy.  For a byte-aligned decompressor,​ this implies
 +at least two bytes (since the decompression routine must be small, a byte-
 +aligned decompressor is of greater interest).
 +
 +So far so bad.  But, since a tiny program doesn'​t use all possible byte
 +values, perhaps a simple substitution method is possible -- that is, replacing
 +various 2-, 3-, 4- etc. byte sequences with one of those bytes. ​ Alas, the
 +numbers don't work out very well: in the above scheme, let's say an
 +n-byte sequence appears m times, for a total of n*m bytes. ​ We replace
 +each of those sequences with a single byte, giving a savings of
 +
 + n*m - m
 +
 +bytes. ​ But we also need to store the original sequence in a table somewhere,
 +plus the byte corresponding to that sequence. ​ This means another n+1 bytes,
 +so the total savings is
 +
 + n*m - (m+n+1)
 +
 +So, for example, if we have a 2-byte sequence repeated 3 times, the savings
 +is
 +
 + 2*3 - (2+3+1) = 0 bytes!
 +
 +The issue is that replacing a 2-byte sequence with a 1-byte sequence saves
 +one byte, and this must happen at least three times to overcome the three
 +bytes of table storage. ​ With 4 repetitions,​ one byte is saved. ​ And so
 +on.  The net result is that you have to have a _lot_ of sequences repeated
 +a lot of times to get any savings, and somehow this savings needs to be
 +greater than the decompression code.
 +
 +Note that, in general, if there are only a few sequences to replace then
 +some custom code can be shorter, but as the number of sequences increases
 +then generic code, with sequences stored in tables, quickly becomes
 +shorter.
 +
 +The final dagger in the heart of these schemes is the program itself. ​ I
 +wrote a simple program to find all the repeated strings in tetrattack.
 +There were numerous 2-byte sequences, but rarely repeated more than three
 +times. ​ There were a few 3-byte sequences, and even one 4-byte sequence.
 +But overall, there just aren't enough sequences, especially long sequences,
 +to make any sequence-shortening scheme worthwhile.
 +
 +There are, however, lots of repeated bytes. ​ For example, there are an
 +awful lot of STA instructions and zp variables, and usually a lot of
 +zeros. ​ This suggests taking another look at a Huffman-style scheme to
 +replace those bytes with a smaller number of bits.
 +
 +One major problem identified earlier is storing the Huffman tree -- it's
 +just too big.  So, the first thing to consider is to just compress the
 +most common bytes -- maybe the top five or eight bytes or whatever --
 +to make the tree smaller. ​ But that still leaves a substantial decompression
 +code to traverse the tree, and a tree with several empty nodes in it.
 +
 +Once again, if there are only a few nodes then custom code might be shorter,
 +but beyond a few nodes it's more practical to store the tree and have some
 +generic traversal code.
 +
 +But consider the following alternative to a Huffman tree: what if we had a
 +tree which looked like
 +
 +       0/\1
 +       /  \
 +    ​lit 0/​\1
 + /  \
 +       b1  0/\1
 +   /  \
 + b2    ...
 +
 +where "​lit"​ is a literal byte and b1, b2, etc. are encoded bytes. ​ This would
 +encode the most common byte with two bits (10), the next-most common byte
 +with three bits (110), and so on, and encode literal bytes with nine bits
 +(0byte).
 +
 +To see if this is a viable scheme, check out the numbers. ​ Let's say that
 +we encoded just the top three most common bytes in a program; in an early
 +version of tetratack those bytes were
 +
 + 00 $2F occurances
 + 03 $1A occurances (ZP variable)
 + 8D $18 occurances (STA opcode)
 +
 +In a real implementation,​ I'd choose zp variables to be common opcodes
 +like $8D, which would double the number of occurances of that byte, but
 +let's just consider the above numbers in a 512-byte program. ​ Those top
 +three bytes total $5A bytes, almost 1/8 of the total bytes (the top seven
 +bytes take a little less than 1/3 of the total, by contrast). ​ The number
 +of bits required is
 +
 + 9*($200-$5A) + 2*$2F + 3*$1A + 4*$18 = $FE2 bits = $1FD bytes.
 +
 +so, a whopping 3 byte savings -- just enough to store the three "​tree"​
 +values. ​ For the top seven bytes, a similar calculation suggests $01D8 bytes
 +of compressed data, again for code not optimized for compression. ​ For a
 +more optimized code, with zp variables the same as common opcodes and such,
 +the total bytes could conceivably be on the order of $1C0 bytes -- maybe
 +64 bytes of savings (but again, I never actually tried coding a more
 +optimized version so I don't know for sure).
 +
 +Of course, we haven'​t talked about decompression yet.  One important
 +feature of this tree is that the "​treeness"​ of it is unimportant -- it's
 +really just a substitution code.  Remember that the data is encoded as
 +
 + 0+8 bits literal byte
 + 10 byte 1
 + 110 byte 2
 + 1110 byte 3
 +
 +and so on.  A decompression algorithm simply reads bits until a 0 is found,
 +and then looks up the relevant value in a table. ​ The exception is if
 +the first bit read is 0, in which case the code needs to read the next
 +eight bits and output the byte.  There is no tree traversal, in the sense
 +of looking up nodes and moving left/right, to speak of.
 +
 +Reading bits is trickier than reading bytes. ​ After every eight bits read
 +some sort of memory pointer has to be incremented to the next eight bits.
 +Moreover, when reading codes like 110 or 1110 the number of 1's read has
 +to be counted too, to be used as an index into the lookup table. ​ And if
 +there'​s a literal byte then the next eight bits need to be read, regardless
 +of their value.
 +
 +When reading the bitstream, after every eight bits the pointer into that
 +bitstream must be incremented. ​ To count eight "​global"​ bits at a time, I use
 +a zp variable "​count"​ and just rotate a bit through the different bit
 +positions. ​ Every time it turns zero, the data stream pointer is incremented.
 +The advantage of this method is that it frees up a register (and winds up
 +saving a byte overall).
 +
 +To count the number of "​local"​ bits read, I use the .Y register. ​ Initially
 +it is set to zero, and increments as 1s are read.  For a literal byte,
 +I set .Y to -8, and put a little check in the bit reader for negative .Y;
 +that way, it simply increments to .Y=0.
 +
 +When bit=0 is finally found, the code simply does an lda data,y to get the
 +coded data.  Since .Y counts the number of bits read, .Y will be at least 1
 +for any coded data, but .Y=0 for a literal byte as described above.
 +Therefore, for a literal byte, the code can simply rotate bits into the
 +location "​data",​ and use the same lda data,y instruction to fetch the
 +literal byte.
 +
 +With all that in mind, here's the code I finally came up with (on paper!):
 +
 +Tuff+1
 +[data lookup table]
 +
 +Source ​ [compressed data bitstream]
 +
 +:lit ldy #-8
 +:getbit lsr count ;​Increment pointer every eight bits
 + bne :skip
 + ror count
 + inx
 + bne :skip
 + inc :src+2
 +:skip
 +:src asl Source,​x ;​Get next bit
 + bcs :next
 + tya ;.y=0 means first bit is 0
 + beq :lit ;so read next 8 bits
 + bpl :found ;If .y<0 then literal
 +:next rol Tuff ;Save bit
 + iny
 + bne :​getbit ;​until .y=0
 +:found
 + lda Tuff,​y ;​Fetch code/​literal byte
 + ldy #00
 + sta (zp),​y ;​Output data
 + inc zp
 + bne :getbit
 + inc zp+1
 + bpl :getbit
 +[Uncompressed code -- (zp) points here!]
 +
 +for a total of 45 bytes, plus 7 bytes for the "​Tuff"​ byte table -- total
 +of 52 bytes, compared with a possible 64 byte savings (maybe I should have
 +tried to implement this after all!). ​ Hopefully someone can improve on
 +this further. ​ (And "​Tuff",​ btw, is a contraction of "​TinyHuff"​).
 +
 +Now, there may be a few raised eyebrows here which I hope I'll lower. ​ This
 +code makes several assumptions. ​ It can either be entered in at ":​src",​
 +or at ":​getbit"​ depending on the value of "​count",​ the global bit counter.
 +I assume count is already initialized (e.g. default kernal/​restart value),
 +either to 1 (in which case :getbit is the entry point) or $80, in which case
 +:src is the entry point, with the "​Source"​ stream modified accordingly in
 +each case.
 +
 +I further assume that .axy have their default values of 0 when the routine
 +is called from BASIC.
 +
 +I also assume "​zp"​ is initialized to the correct value already. ​ One easy
 +way to do this is to use $ae/$af, the end load address+1, but other
 +options are certainly available.
 +
 +Finally, you may have noticed that the thing just keeps outputting data
 +until zp=$8000 -- correct. ​ Unless you really really care about garbage in
 +memory above the program, there'​s no reason to waste bytes on an "end of data"
 +check.
 +
 +Another variant
 +---------------
 +
 +I spent a little time looking at a variant of this scheme: using a fixed
 +length bit sequence, for example, assigning four bits to each substitution
 +byte, i.e.
 +
 + 0+ literal byte
 + 1000 byte 1
 + 1001 byte 2
 + 1010 byte 3
 + ...
 + 1111 byte 8
 +
 +The numbers work out pretty well, compression-wise,​ but at the time I felt
 +the decompression code was more complicated and that it wouldn'​t break
 +even.  Looking at it now, though, I'm starting to think otherwise -- that
 +loading three bits at a time isn't any more complicated that fetching
 +eight bits so code can be reused, and there isn't any issue about checking
 +for a 1 bit, so the decompression code ought to be even simpler. ​ So this
 +may actually be the better scheme.
 +
 +Sigh... ​ I don't know about you, but every time I revisit a problem I wind
 +up feeling much stupider.
 +
 +Final thoughts
 +--------------
 +
 +So, that's it: assign shorter bit codes to the most common bytes, and use
 +a highly optimized decompression routine. ​ This method is very specialized,​
 +obviously, but might, just might, make it possible to save a few bytes on
 +a tiny program that has been optimized for compression -- using zp locations
 +the same value as common opcodes, reusing vars as often as possible, and
 +so forth.
 +
 +The reason this works is that, in general, repeated long strings in a small
 +program just don't happen, whereas repeated bytes are common and make up
 +a substantial fraction of the total.
 +
 +Pelle (magervalp@cling.gu.se) has suggested the possibility using the first
 +eight bytes of the compressed program for the '​Tuffman'​ table -- to structure
 +the code to make these common bytes, and thereby save eight bytes by not
 +having to store the '​tree'​. ​ Might save a few more bytes with some programs.
 +
 +Finally, while writing this article, I started to wonder about byte sequences
 +with "​holes",​ for example, <byte skip byte skip> for things like STA zp1
 +STA zp2.  I wonder if there'​s some way of taking advantage of these repeated
 +"​mask"​ patterns.
 +
 +But I leave that thought for another person, for another day.
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 5===================================
 +
 +Tinyrinth
 +by Mark Seelye (aka Burning Horizon)
 +mseelye@yahoo.com
 +http://​www,​burninghorizon.com
 +
 +
 +Introduction
 +    When the mini-game competition was first announced I was originally not
 +interested because of time considerations -- I spend most of my time at work,
 +or watching the kids.  But once I saw that nearly everyone on EFNet #c-64 was
 +doing one, I thought it would be fun if I could just keep it simple.
 +    ​
 +    I had no idea what game to do.  I had a few other ideas but I settled on
 +a maze game because I thought it'd be an interesting challenge doing a maze
 +algorithm and have game code in under 512 bytes. ​ As it turned out, I ended up
 +getting the code down to like 475 bytes, but the version I turned in was 509
 +bytes. ​ (I ended up shaving an additional 34 bytes just before the due date.)
 +
 +    The other reason for the maze game is I have this weird thing I do where
 +in each language I know I code this little maze generation algorithm that I
 +made a long time ago.  I had used the algorithm in ASP, VB, C, and a few
 +others but for some reason I had never done it in 6502!  Imagine that!  It was
 +going to be a bit of a challenge because I had never done it in so few
 +instructions,​ and never in ASM.  It would have to lose some weight and fluff -
 +but I knew I could do it.  So obviously there are MANY better ways to generate
 +a maze, but in light of trying to make it small this routine ended up being
 +such a strange mutation of my idea that I'm surprised it worked at all.
 +
 +Setup
 +    Before I do anything I must setup some things. ​ This is also used when a
 +player dies OR enters a new level. ​ The only difference between a reset of the
 +game and moving to the next level is the ldx #$00: sta $53.  The $53 is also
 +used to set the color of the maze so it rotates from a level one of white, to
 +a level 16 of BLACK (which is evil because then you can't see the maze except
 +for the small hint of the corners on a turn.)
 +    ​
 +; Initialization
 +setup = *
 +gameover = *
 +        ;Setup game variables
 +        ldx #$00
 +        stx $53         ; Start at level 1 (to be inc'd)
 +init = *
 +        ;Setup level
 +        inc $53         ; next level
 +        ldx $53
 +        stx $54         ;​Counter for drawing Keys (next level)
 +        stx $0286       ;​Character Color to use on clear (e544)
 +
 +        ;Set Render Cursor Start Pos / Player Color
 +        lda #$05
 +        sta EBCM1       ;Set ebcm color PLAYER to GREEN ($d022)
 +        sta $45         ; Cursor/​Player Position X (0-39)
 +        sta $46         ; Cursor/​Player Position Y (0-24)
 +        ;Clear Screen
 +        jsr $E544       ;​clear screen set to char color ($0286)
 +
 +        lda #$5b
 +        sta $d011      ;turn on EBCM
 +        lda #$18
 +        sta $d018      ;Activate Cset
 +
 +As you can see that I mention extended color background mode, I'll talk about
 +that later. ​ You also might notice I use a character set, but that I will talk
 +about a little later as well, first I want to jump right into the maze
 +generation.
 +    ​
 +
 +Maze Algorithm
 +    The easiest way to describe how the algorithm works is to compare it to a
 +worm.  The worm eats through an area until it gets blocked by one of its
 +former meals. ​ While eating the worm is counting how many meals he has, once
 +he gets to a certain number he knows he can eat no more.
 +    With that said, I'm sure you're completely confused and probably think
 +(know?) I'm crazy. ​ So now for the technical explanation. ​ Starting with a
 +2-dimensional grid of some size, initialized with 0's, set a start point for
 +the formation of the maze within the grid somewhere. ​ After that has been
 +taken care of the rest follows simple logic:
 +
 +        Have we reached every part of the grid?  ​
 +            if so: done; 
 +            if not: Can we grow the maze into an adjacent area from the
 +     current position?  ​
 +                if so: Grow in one of those directions at random, count it,
 +                       ​continue; ​
 +                if not: Find a place in the maze that can grow, continue
 +
 +    I had to change the order of the logic around to get it to better fit into
 +a smaller number of bytes. ​ So I moved the "find a place to grow from"
 +routine, and the "Have we reached every part of the grid" check after the
 +check if we can grow in any direction from the current position.
 +    ​
 +    To check in all directions from the current position I used a subroutine.
 +The subroutine has 5 entry points which translate to: cangrow up/​right/​down/​
 +left, and the 5th entry point is actually the first entry point - it checks
 +to see what is in .a and checks the direction that value represents. ​ To check
 +the direction I use a kernal routine to set some zp so I can easily check the
 +value on the screen to the left, right, below, or above the current position.
 +This is also where the boundaries are set: 40 columns wide and 25 characters
 +high.
 +    ​
 +popgridloop = *
 +        ldy $45         ;xpos
 +        ldx $46         ;ypos
 +cangrowxy = *           ;Can grow in any direction?
 +        jsr cgup        ;check up
 +        beq _cgxy       ;if 0 then we can grow
 +        inx             ;​offset up check
 +        jsr cgright ​    ;​check right
 +        beq _cgxy       ;if 0 then we can grow
 +        dey             ;​offset right check
 +        jsr cgdown ​     ;check down
 +        beq _cgxy       ;if 0 then we can grow
 +        dex             ;​offset down check
 +        jsr cgleft ​     ;check left
 +_cgxy   beq growloop ​   ;if 0 then we can grow
 +
 +....The subroutine!... ​ There is a weird "​injection"​ in the bottom of this
 +routine to save a couple bytes.. it is explained in the comments to the code.
 +
 +;Check if a cell can grow a direction
 +;1-up 2-right 3-down 4-left
 +; (y xpos, x ypos, a=dir) ​ x/y switched for indirect lda($xx),y below
 +; return: a == 0 : true (can move)
 +;         a <> 0 : false (cannot move)
 +cangrow = *
 +    cmp #$01
 +    beq cgup
 +    cmp #$02
 +    beq cgright
 +    cmp #$03
 +    beq cgdown
 +    ;cmp #$04
 +    ;beq cgleft *** not needed falling through
 +cgleft = *
 +    dey         ;set xpos - 1
 +    cpy #$ff    ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgright = *
 +    iny        ;set xpos + 1
 +    cpy #$28   ;​check xpos (<40)
 +    beq cgno
 +    bne cgloadcell
 +cgup = *
 +    dex        ;set ypos - 1
 +    cpx #$ff   ;​check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgdown = *
 +    inx        ;set ypos + 1
 +    cpx #$19   ;​check ypos (<25)
 +    beq cgno
 +    ;*** fallthrough,​ bne cgloadcell not needed
 +cgloadcell = *
 +    lda #$1f
 +loadcell = *        ;x = ypos, y = xpos, a = and value
 +    sta $50
 +    jsr $e9f0   ; * sets $d1 $d2
 +    lda ($d1),​y ​    ;load byte  (x pos in y reg!)
 +cgno = *
 +    and $50  ;#$1f = use only low 5 bits!
 +    ;rts see below!
 +;  This is mixed with the rts because the first byte would
 +;  be wasted
 +growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto ​ 48 is used for both
 +;  again first byte would be wasted so we overlap with the previous
 +growinto = *  ​
 +    .byte 2, 4, 8, 1, 2
 +
 +;​explanation of above
 +; 0  1  2  4  8
 +;       ​0 ​ 4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +*From Mapping: ;​59888 ​        $E9F0
 +               ;Set Pointer to Screen Address of Start of Line
 +               ;This subroutine puts the address of the first byte of the
 +        ;​screen line designated by the .X register into locations
 +        ;​209-210 ($D1-$D2).
 +
 +    Well, if the code at "​cangrowxy"​ discovers that it cannot grow in any
 +direction from the current position, it has to find a place to grow from.  To
 +do this it falls into the "​findgrow"​ routine; the findgrow routine is
 +interesting because it is a "​stateful"​ routine, meaning that upon reentry it
 +resumes what it was doing before. ​ The reason for this is I didn't want it
 +restarting at the top of the maze each time it entered this find routine - I
 +wanted it to continue to search from where it left off.  This routine works by
 +"​walking"​ the current x and y positions through the grid.  To walk it through
 +it keeps track of where it left off and it just keeps setting the x and y
 +pointers to the next place and "​returns"​ to the start of the generation
 +portion of the code.
 +    The other advantage to doing it this way is I can reset this routine to
 +place a "​key"​ at a dead end in the maze.  This reset is done elsewhere in the
 +maze by setting a zp value to 0. When the reset is done the findgrow routine
 +runs a small piece of code to reset itself, but before it does this it places
 +a key if it has not exceeded the max number of keys.
 +
 +    Here is the find grow routine, each line of code is documented to hint at
 +what it is doing:
 +   
 +; *** fall into findgrow
 +findgrow = *
 +        lda $4b         ; Check byte 0 != resume findgrow
 +        bne _fgresume
 +        sta $49         ;​Reset Findgrow Xpos
 +        sta $4a         ;​Reset Findgrow Ypos
 +        inc $4b         ;Set findgrow flag to resume (<>0)
 +
 +        ;Place keys in corners (injected here for ease of placement, d1/d2 is
 +        ;pointed at a dead end)
 +        iny             ; offset left check
 +        beq _fgresume ​  ;Do not try when column is 0, it freaks out
 +        lda $54
 +        beq _fgresume ​  ;if 0 then keys are done
 +        dec $54         ;dec # of keys left to place
 +        inc $51         ;​actual num keys left
 +        lda ($d1),​y ​    ;load byte
 +        ora #$c0        ;EBCM value for key!
 +        sta ($d1),​y ​    ;​store new value
 +        ;(end of injection)
 +
 +_fgresume = *
 +_fgx    ldx $4a         ;​Findgrow ypos
 +_fgy    ldy $49         ;​Findgrow xpos
 +        inc $49         ;Next xpos (next round)
 +        cpy #$28        ; < 40
 +        beq _fgcr       ; next line if >= 40
 +        jsr cgloadcell ​ ; load cell byte
 +        beq _fgy        ; if 0 then get next xpos/byte
 +        sty $45         ;Set Current xpos
 +        stx $46         ;Set Current ypos
 +        jmp cangrowxy ​  ;​Check if this can grow
 +_fgcr   lda #$00        ;Reset Findgrow xpos
 +        sta $49         ; 0->xpos
 +        inc $4a         ;Next Findgrow ypos
 +        lda $4a         
 +        cmp #$19        ;check ypos (<25)
 +        bne _fgx        ;If we're at x40y25 we are ready to play!
 +        beq gameinit ​   ;Start game logic
 +    ​
 +    That code also checks to see if we have completed generating the maze, at
 +which point it enters the game logic. ​ But before we jump into that I have to
 +explain what happens when the "​cangrowxy"​ routine discovers that it CAN GROW,
 +instead of falling into the "​findgrow"​ routine.
 +    ​
 +    In the "​cangrowxy"​ routine described a bit above there was the final line
 +of:
 +
 +_cgxy beq growloop ;if 0 then we can grow
 +
 +    This "​growloop"​ routine is the place where the actual "​growing"​ or
 +"​eating"​ occurs. ​ This routine works by choosing a direction at random, and
 +attempting to grow in that direction. It reuses the same "​cgup"​ etc. routines
 +that the "​cangrowxy"​ uses because although we know we can grow in a direction,
 +we don't know which direction. ​ In a larger version of this algorithm I
 +usually link all of this stuff together, but this is how it worked out in
 +tinyrinth.
 +    ​
 +growloop = *        ​
 +randdir = *
 +        ;jsr getrand; not a func, not reused yet
 +getrand = *
 +        lda #$80
 +        sta $028a ; Key Repeat; (injected here for #$80) just using #$80 for
 +   ;smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +        sta $d40f       ;set v3 random # gen bits
 +        lda $d41b       ; read random number from V3
 +        and #$03        ; Force Random number to 0-3
 +        clc
 +        adc #$01        ; Add 1 to get 1-4
 +        sta $4c         ; store rand direction
 +        ldy $45         ; Current Xpos
 +        ldx $46         ; Current Ypos
 +        jsr cangrow ​    ; Check if we can grow in that direction
 +        bne randdir ​    ; if <> 0 then Try again
 +        sta $4b         ; reset findgrow flag (injected here for .a==0)
 +grow = *
 +        ldx $4c         ;Get saved rand direction
 +        lda growinto,​x ​ ; 1-4 (4, 8, 1, 2) Get bit set for new cell
 +        sta ($d1),​y ​    ; write new location
 +        lda growfrom,​x ​ ; 1-4 (1, 2, 4, 8) Get bit to set for old
 +        sta $4d         ; Save growfrom bit
 +        ldy $45         ; Reload Current xpos
 +        ldx $46         ; Reload Current ypos
 +        jsr cgloadcell ​ ; Load base cell again
 +        ora $4d         ; Combine with growfrom bit
 +        sta ($d1),​y ​    ;​Modify old cell to connect to new loc
 + ;Change current position
 +        lda $4c         ; Get saved rand direction
 +        jsr cangrow ​    ; Get new x y again - (this will only perform next x/y
 + ;adj, returns <>0)
 +        sty $45         ; xpos   set to new location
 +        stx $46         ; ypos
 +        jmp popgridloop ; Return to populate grid loop
 +    ​
 +    As you can see above I use the random number generator from V3 to get the
 +random direction. ​ A HUGE bonus here is I can use the value returned to
 +directly call that main entry point in "​cangrow"​ so it will figure out which
 +direction to check by itself. ​ If the direction it checks is bad it just loops
 +and tries again. ​ In a larger version of the program I'd make this much more
 +efficient, but efficiency costs bytes. ​ Many coders would be afraid of using a
 +loop like this but since I pre-checked that I can grow before entering this
 +loop, unless my random number generator never returns that direction, I should
 +be fine!
 +    Once we have a good direction I use the already set $d1 zp to read and
 +update the screen data.  Nice bonus of using the routine that sets it!  The
 +code that updates the data must update the cell you are growing to, but ALSO
 +it must update the cell you are growing from, which is why I have to reload
 +the original data up there. ​ When it is finished doing the growing it moves
 +the x/y positions ($45, $46) to the place it grew to; and resumes the maze
 +generation from there.
 +    ​
 +    As I said before, the findgrow routine will jump into the game logic once
 +it knows it is finished, but before we talk about that I should describe
 +another problem that I needed to solve.
 +
 +    In a 2 dimensional maze there are 16 possible pieces that can be formed.
 +Starting with completely closed ending with completely open.  I represent
 +these pieces with numbers 0-15; each bit represents a side of a cell that is
 +open/​closed. ​ So when I finish with the maze generation I have a grid of
 +numbers 0-15. (Actually 1-14; in this game I never end up with 0's or 15's
 +as all pieces are used.) ​ In Tinyrinth the "​grid"​ I used was the screen memory
 +(40x25), so I ended up with a screen full of characters A-N.
 +    Navigating a maze of characters A-N would be rather difficult so I had to
 +decide whether to use the built in character set and transform what was on the
 +screen or somehow use a character set.  Well, another hitch was I wanted this
 +to be a game and losing the simplicity of 0-15 for detecting which way a
 +player could move would be bad - so I opted for a character set of 16
 +characters. ​ Hmmm 8 bytes per character time 16 characters.. 128 bytes!
 +Yikes!
 +
 +Character Set Generation
 +    So I was going to need a 128 byte character set, but I didn't want to have
 +to lose 128 bytes. ​ So what did I do?  I coded about 70 bytes of code to
 +generate the character set.  This too pained me because I was sure I could
 +make it smaller somehow, but once I got it working and a decent percentage
 +smaller than the size of what it produced I stopped worrying about it.
 +    Here is how it works: we know we want to represent the 16 pieces with 0-15
 +so the bits must give some hint to the shape of the piece. ​ What I did was
 +create a loop to check the current counter value and either set or skip the
 +top, sides, bottom of the piece.
 +        Here is how it works, I have commented each line of code to describe
 +what it is doing.
 +
 +        ; Generate Cset!
 +        lda #$20       ; write hi
 +        sta $48        ; use zp
 +        lda #$00       ; write lo
 +     
 +        ;Initialize Screen, variables (injected here to save bytes - using
 + ;lda #$00)
 +        sta $51           ; Clear actual num keys placed counter (see findgrow)
 +        sta $d020
 +        sta EBCM0; ​ Set BG Color ($d021)
 +        ;(end of injection)
 +
 +        tax            ; counter = 0
 +_again ​ sta $47        ; use zp
 +        ldy #$00       ; index
 +        txa            ; counter to a
 +        and #$01       ; check for top
 +        beq _ytop      ; yes top
 +        eor #%01111111 ; 00000001 -> 011111110 -> 10000001
 +_ytop   eor #%11111111 ; 00000000 -> 111111111 ​   ​
 +_6sides sta ($47),​y ​   ; store top/sides to cset
 +        iny            ; next mem location
 +        txa            ; counter to a
 +        and #$02       ; check for right
 +        eor #%00000010 ; flip
 +        lsr            ; 00000010 -> 00000001 || 0->0
 +        sta $49        ; store for right side 
 +        txa            ; counter to a
 +        and #$08       ; check for left side
 +        bne _noleft ​   ; no left
 +        eor #%10001000 ; 00000000 -> 10001000 -> 10000000
 +_noleft eor #%00001000 ; 00001000 -> 00000000
 +        ora $49        ; merge with right
 +        cpy #$07       ; 7->​15->​23->​...
 +        bne _6sides ​   ; total of 6 side pieces
 +        txa            ; counter to a
 +        and #$04       ; check for bottom
 +        beq _ybot      ; no bottom
 +        eor #%01111010 ; 00000100 -> 01111110 -> 10000001
 +_ybot   eor #%11111111 ; 00000000 -> 11111111 ​   ​
 +        sta ($47),​y ​   ; store bottom to cset
 +        inx            ; next counter
 +        clc            ; clear carry
 +        lda $47        ; inc zp
 +        adc #$08       ; by 8
 +        bne _again ​    ; do it again
 +        inc $48
 +        ldy $48
 +        cpy #$28       ;​repeat through cset 2000-2800
 +        bne _again
 +        sty EBCM2      ;Set Death color ($d023) (using result of cset gen for
 +        ;​color value!)
 +
 +    I still think I could have made that smaller; either way though it was fun
 +making a little routine to generate the cest.  With that all said, we now know
 +how the maze is generated. ​ But now I needed a way to play it, as suggesting
 +players use dry erase markers on their screens was not a great idea.
 +
 +
 +Game Loop
 +    Ok, once "​findgrow"​ reaches the end of its maximum size it branches to
 +"​gameinit"​. ​ Game init sets up some quick things and gets going on the game
 +play.  The game loop loops until the player either gets all the keys or dies.
 +When the player gets all the keys the level number ($53) is increased which is
 +used to increase the maximum number of keys and the color of the maze!
 +    Here is the game loop, I have commented much of the code and separated it
 +into sections:
 +
 +*** Flashes the keys, sets speed of "​minitaur"​ based on the level
 +
 +; Game Initialization and Game Loop
 +gameinit = *
 +gameloop=*
 +        inc EBCM3       ; Flash Keys ($d024)
 +        inc $58         ; Increase Speed counter #1 (0-255)
 +        bne moveplayer ​ ; Skip move
 +        inc $57         ; Increase Speed counter #2 ($57|#$f0 - 255)
 +        bne moveplayer ​ ; Skip Move
 +        ​
 +        ;set death speed
 +        lda $53         ;Use level for Speed value
 +        cmp #​%11111000 ​ ;If more than this use default speed
 +        bmi _dsp        ​
 +        lda #​%11111000 ​ ;Default speed
 +_dsp    ora #​%11110000 ​ ;Set high nibble so counter counts up to 255
 +        sta $57         ; Set Speed counter #2
 +
 +*** Moves the minitaur to the next position if it is time. (Checks for player
 +*** hits)
 +
 +;move death
 +movedeath = *
 +        ldy $55         ;​Baddy Xpos
 +        ldx $56         ;​Baddy Ypos        ​
 +        jsr cgloadcell ​ ; load the cell/point the zps (ANDs by #$1f)
 +        sta ($d1),​y ​    ;​store cleared value
 +_newy   ​iny ​            ;​increase xpos
 +        cpy #$28        ;less than 40?
 +        bmi _go         ;​don'​t reset
 +        ldx $46         ;ypos of player
 +        stx $56         ;ypos of death
 +        ldy #$00        ;clear xpos counter
 +_go     sty $55         ;Set baddy xpos
 +        lda #$ff        ;Get all bits! (see loadcell)
 +        jsr loadcell ​   ;load the cell/point the zps
 +        sta $59         ;Save cell value (withh all possible bits)
 +        and #​%11000000 ​ ;and by EBCM bits
 +        cmp #​%11000000 ​ ;Check for KEY - (so it can skip over)
 +        beq _newy       ;Jump ahead 1 more to skip key position
 +        cmp #​%01000000 ​ ;Check for player hit
 +        bne _nodie ​     ;Player is not dead
 +        jmp gameover ​   ;Game Over!  ​
 +_nodie ​ lda $59         ;​Reload stored value
 +        ora #$80        ;EBCM for Death
 +        sta ($d1),​y ​    ;​store value
 +; *** fall through to Move Player
 +
 +*** Checks the keyboard for input, moves player if possible
 +
 +;Move Player
 +moveplayer=*
 +_ffe4   jsr $ffe4       ;Get keypress
 +        beq gameloop ​   ;no key - goto gameloop
 +        and #​%00000011 ​ ;.a == 0-7 at this point
 +        tax             
 +        lda keytodir,​x ​ ;Loads from keytodir
 +;Move entity in game
 +; .a=direction 1-up 2-right 3-down 4-left
 +glmove ​ tax
 +        stx $4e         ; store direction
 +        lda growinto,​x ​ ; get check bit
 +        sta $4f         ; store check bit
 +        ldy $45         ; current xpos
 +        ldx $46         ; current ypos
 +        jsr cgloadcell ​ ; load the cell (and with #$1f)
 +        sta ($d1),​y ​    ; store the data (clear the EBCM)
 +        lda #$00        ; Bottom "fall out" fix
 +        sta $50         ; clear and of cangrow ​ Bottom "fall out" fix
 +        lda $4e         ; load direction
 +        jsr cangrow ​    ; call cangrow to move xpos/ypos
 +        and $4f         ; check bit
 +        bne glmyes ​        ; if we have a bit then we can move!
 +        ldy $45         ; reload xpos - do not move
 +        ldx $46         ; reload ypos - do not move
 +glmyes ​ lda #$ff        ; bits to obtain from loadcell
 +        jsr loadcell ​   ; load the cell/point the zps
 +        pha             ; temp store value for later checks
 +        and #$1f        ; clear other EBCM bits 
 +        ora #$40        ; EBCM ORA Player/​Baddy
 +        sta ($d1),​y ​    ; store new data
 +        sty $45         ; store xpos of new position
 +        stx $46         ; store ypos of new position
 +
 +*** Checks for key hits, minitaur hits, Check for end-level: ​
 +    (jumps to next level if it is the end of the level, loops to the beginning
 +     of the game loop if not.)
 +
 +        ;Hit checks
 +        pla             ; load previous value
 +        and #$c0        ; check for hits "​11xxxxxx"​
 +        cmp #$c0        ; check for key hit
 +        bne _back       ;​_notkey ​    ; to next check
 +        dec $51         ; dec number of keys left in level
 +        bne _back       ; if 0 then we should go to the next level 
 +        jmp init        ; gen maze again
 +;_notkey cmp #$80        ; check for death hit!
 +;        bne _back 
 +;        jmp gameover ​      ; game over
 +_back   jmp gameloop
 +        ;more checks here?
 +
 +keytodir=*
 +.byte 2,1,4,3
 +
 +
 +    To make this all work, I reused the "​cangrow"​ routine again! ​ Because
 +basically it is the exact same logic. ​ One thing that I found unfortunate is I
 +never coded a way for the minitaurs to have any intelligence,​ there was an
 +attempt that worked - but it was FAR too many bytes (over by like 20-30). ​ So
 +I just axed it.  I also at one time made the movement routines all one
 +subroutine that took parameters that would move any "​entity"​. ​ This also took
 +too many bytes to be useful for this version of the game.  Perhaps some day
 +I'll write a kick butt 1k or 2k version with all those features coded in.
 +
 +Various Space Saving Techniques Used
 +
 +    If you look throughout the code, you'll notice I comment a lot on
 +"​injections"​. ​ These are little bits of code that are conveniently put in
 +places to take advantage of the state of a register, or a memory location.
 +    Here is an example: in the "​getrand"​ routine I injected a line to set the
 +key repeat, using the value that I wanted to also put into $d412 for the
 +random number generation.
 +
 +        lda #$80
 +        sta $028a; Ket Repeat; (injected here for #$80) just using #$80 for smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +
 +
 +Another example, here I mixed some static data together to save some bytes and
 +just labeled it so some would be reused. ​ To save additional bytes I also
 +positioned the label in front of the "​rts"​. ​ (This was because the code
 +accessing the data never used an index of 0.)  This is mixed with the rts
 +because the first byte would be wasted
 +
 + ​growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto ​ 48 is used for both
 +;  again first byte would be wasted so we over lap with the previous
 +growinto = *  ​
 +    .byte 2, 4, 8, 1, 2
 +
 +;​explanation of above
 +; 0  1  2  4  8
 +;       ​0 ​ 4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +    I also reused the "​cangrow"​ routine as much as possible, and on top of
 +that I made it more usable by making multiple entry points so it could be used
 +in different way.
 +    ​
 +    Finally the best thing I did was use kernal routines, like $E544 to clear
 +the screen to the set char color at $0286. ​ This allowed me to change the
 +color of the maze each level just by changing the value at $0286.
 +    I also used $e9f0, Mapping the c64 says, "Set Pointer to Screen Address of
 +Start of Line. This subroutine puts the address of the first byte of the
 +screen line designated by the .X register into locations 209-210 ($D1-$D2)."​
 +This was handy because I could use this to read from and write to the screen.
 +
 +Extended Color Background Mode
 +    I could not use sprites. ​ This made me sad, but there was just no room for
 +it.  So I just decided to allow the cset to repeat itself through the whole
 +cset area ($2000-$2800) and use ECBM!  This allowed me to just set a bit and
 +make a maze piece a different color, what more when I read this value I could
 +easily tell what the piece represented! ​ That made collision detection very
 +easy, and also allowed me to have cool flashing keys.
 +
 +Conclusion
 +    I had a LOT of fun coding this and look forward to another competition. ​ I
 +was surprised and amazed by the entries! ​ This even inspired me to make a
 +better version of Tinyrinth, of which I have not completed. ​ I did add to it,
 +I made a version in which the minitaurs move through the maze.  I made another
 +version that would also increase the number of minitaurs in the maze.  I made
 +yet another version that would change the size of the maze, and thought about
 +changing the shape too.  All of these ideas were easy to implement once I had
 +the base game working.
 +    Oh by the way, I know how to spell Minotaur, the Minitaur thing is just a
 +bad joke.  :)
 +
 +
 +Final Code Stats
 +    Total source compiled: 445 lines in 1 file(s)
 +    Symbol table: ​ 29 global and 1 local constants, 0 global variables,
 +    2 global and 15 local labels, 0 source labels. ​
 +    Data size: 10 bytes
 +    Code size: 475 bytes in 231 instructions
 +    Memory block affected: $1000 - $11E4 (total size: 485 bytes)
 +
 +Source
 +    Note: This source is not the source I released for the event. ​ It is a bit
 +smaller, I cannot locate my original source for some reason. ​ Also, some of it
 +may differ a tiny bit from the code documented above. I had written the
 +article using the wrong copy of the source, I have attempted to go back and
 +update it though. ​ (Note to self: keep better records!) ​ :)
 +
 +    Another note:  This source compiles with c64asm by Balint Toth.  :)    ​
 +
 +-- tinyrinth.asm ​ --
 +; Tinyrinth
 +;   entry for a 512b game contest
 +; Burning Horizon/FTA
 +; Mark Seelye
 +; mseelye@yahoo.com
 +; ===================================================================
 +
 +* = $1000
 +
 +;name   ​loc ​    ​desc ​      ​color ​ ecbm bits
 +EBCM0 = $d021 ; untouched ​ black   ​$00 ​ 00
 +EBCM1 = $d022 ; cursor ​    ​red ​    ​$40 ​ 01
 +EBCM2 = $d023 ; touched ​   green   ​$80 ​ 10
 +EBCM3 = $d024 ; keys       ​yellow ​ $c0  11
 +
 +;ZPs used:  (Consolidation Possible if needed)
 +;43/44 - Not Used
 +;45/46 - Current X/Y Position, Maze Generation & Game
 +;47/48 - CSet Location, CSet Generation
 +;49    - Temp Storage, CSet Generation
 +;49/4a - Xpos/Ypos Findgrow, Maze Generation
 +;4b    - Flag, Findgrow, Maze Generation
 +;4c    - Temp Storage, randdir, Maze Generation
 +;4d    - Temp Storage, grow, Maze Generation
 +;4e/4f - Temp Storage, glmove, Game
 +;50    - Temp Storage, loadcell, Maze Generation & Game
 +;51    - NumKeys Left in level (affected by: destroyed & found keys)
 +;52    - not used
 +;53    - Current Level 
 +;54    - # of Keys to try and place, gameinit, Game
 +;55/56 - X/Y Pos of Death
 +;57/58 - speed counter for death
 +
 +;Collect all keys
 +; gen crummb path for visited areas?
 +; once all keys collected next maze is gen'd
 +
 +; Initialization
 +setup = *
 +gameover = *
 +        ;Setup game variables
 +        ldx #$00
 +        stx $53         ; Start at level 1 (to be inc'd)
 +init = *
 +        ;Setup level
 +        inc $53         ; next level
 +        ldx $53
 +        stx $54         ;​Counter for drawing Keys (next level)
 +        stx $0286       ;​Character Color to use on clear (e544)
 +
 +        ;Set Render Cursor Start Pos / Player Color
 +        lda #$05
 +        sta EBCM1       ;Set ebcm color PLAYER to GREEN ($d022)
 +        sta $45         ; Cursor/​Player Position X (0-39)
 +        sta $46         ; Cursor/​Player Position Y (0-24)
 +        ;Clear Screen
 +        jsr $E544       ;​clear screen set to char color ($0286)
 +
 +        lda #$5b
 +        sta $d011      ;turn on EBCM
 +        lda #$18
 +        sta $d018      ;Activate Cset
 +
 +        ; Generate Cset!
 +        lda #$20       ; write hi
 +        sta $48        ; use zp
 +        lda #$00       ; write lo
 +     
 +        ;Initialize Screen, variables (injected here to save bytes - using lda #$00)
 +        sta $51           ; Clear actual num keys placed counter (see findgrow)
 +        sta $d020
 +        sta EBCM0; ​ Set BG Color ($d021)
 +        ;(end of injection)
 +
 +        tax            ; counter = 0
 +_again ​ sta $47        ; use zp
 +        ldy #$00       ; index
 +        txa            ; counter to a
 +        and #$01       ; check for top
 +        beq _ytop      ; yes top
 +        eor #%01111111 ; 00000001 -> 011111110 -> 10000001
 +_ytop   eor #%11111111 ; 00000000 -> 111111111 ​   ​
 +_6sides sta ($47),​y ​   ; store top/sides to cset
 +        iny            ; next mem location
 +        txa            ; counter to a
 +        and #$02       ; check for right
 +        eor #%00000010 ; flip
 +        lsr            ; 00000010 -> 00000001 || 0->0
 +        sta $49        ; store for right side 
 +        txa            ; counter to a
 +        and #$08       ; check for left side
 +        bne _noleft ​   ; no left
 +        eor #%10001000 ; 00000000 -> 10001000 -> 10000000
 +_noleft eor #%00001000 ; 00001000 -> 00000000
 +        ora $49        ; merge with right
 +        cpy #$07       ; 7->​15->​23->​...
 +        bne _6sides ​   ; total of 6 side pieces
 +        txa            ; counter to a
 +        and #$04       ; check for bottom
 +        beq _ybot      ; no bottom
 +        eor #%01111010 ; 00000100 -> 01111110 -> 10000001
 +_ybot   eor #%11111111 ; 00000000 -> 11111111 ​   ​
 +        sta ($47),​y ​   ; store bottom to cset
 +        inx            ; next counter
 +        clc            ; clear carry
 +        lda $47        ; inc zp
 +        adc #$08       ; by 8
 +        bne _again ​    ; do it again
 +        inc $48
 +        ldy $48
 +        cpy #$28       ;​repeat through cset 2000-2800
 +        bne _again
 +
 +        sty EBCM2      ;Set Death color ($d023) (using result of cset gen for color value!)
 +        ​
 +        ​
 +popgridloop = *
 +        ;can grow from current?
 +        ldy $45         ;xpos
 +        ldx $46         ;ypos
 +        ;Can grow in any direction?
 +cangrowxy = *
 +        jsr cgup        ;check up
 +        beq _cgxy       ;if 0 then we can grow
 +        inx             ;​offset up check
 +        jsr cgright ​    ;​check right
 +        beq _cgxy       ;if 0 then we can grow
 +        dey             ;​offset right check
 +        jsr cgdown ​     ;check down
 +        beq _cgxy       ;if 0 then we can grow
 +        dex             ;​offset down check
 +        jsr cgleft ​     ;check left
 +_cgxy   beq growloop ​   ;if 0 then we can grow
 +; *** fall into findgrow
 +findgrow = *
 +        lda $4b         ; Check byte 0 != resume findgrow
 +        bne _fgresume
 +        sta $49         ;​Reset Findgrow Xpos
 +        sta $4a         ;​Reset Findgrow Ypos
 +        inc $4b         ;Set findgrow flag to resume (<>0)
 +
 +        ;Place keys in corners (injected here for ease of placement, d1/d2 is pointed at a dead end)
 +        iny             ; offset left check
 +        beq _fgresume ​  ;Do not try when column is 0, it freaks out
 +        lda $54
 +        beq _fgresume ​  ;if 0 then keys are done
 +        dec $54         ;dec # of keys left to place
 +        inc $51         ;​actual num keys left
 +        lda ($d1),​y ​    ;load byte
 +        ora #$c0        ;EBCM value for key!
 +        sta ($d1),​y ​    ;​store new value
 +        ;(end of injection)
 +
 +_fgresume = *
 +_fgx    ldx $4a         ;​Findgrow ypos
 +_fgy    ldy $49         ;​Findgrow xpos
 +        inc $49         ;Next xpos (next round)
 +        cpy #$28        ; < 40
 +        beq _fgcr       ; next line if >= 40
 +        jsr cgloadcell ​ ; load cell byte
 +        beq _fgy        ; if 0 then get next xpos/byte
 +        sty $45         ;Set Current xpos
 +        stx $46         ;Set Current ypos
 +        jmp cangrowxy ​  ;​Check if this can grow
 +_fgcr   lda #$00        ;Reset Findgrow xpos
 +        sta $49         ; 0->xpos
 +        inc $4a         ;Next Findgrow ypos
 +        lda $4a         
 +        cmp #$19        ;check ypos (<25)
 +        bne _fgx        ;If we're at x40y25 we are ready to play!
 +        beq gameinit ​   ;Start game logic
 +        ​
 +              ​
 +growloop = *        ​
 +randdir = *
 +        ;jsr getrand; not a func, not reused yet
 +getrand = *
 +        lda #$80
 +       sta $028a; Ket Repeat; (injected here for #$80) just using #$80 for smaller code
 +        sta $d412       ;sta $d404   d412 is V3, d404 is V1!!
 +        sta $d40f       ;set v3 random # gen bits
 +        lda $d41b       ; read random number from V3
 +        and #$03        ; Force Random number to 0-3
 +        clc
 +        adc #$01        ; Add 1 to get 1-4
 +        sta $4c         ; store rand direction
 +        ldy $45         ; Current Xpos
 +        ldx $46         ; Current Ypos
 +        jsr cangrow ​    ; Check if we can grow in that direction
 +        bne randdir ​    ; if <> 0 then Try again
 +       sta $4b          ; reset findgrow flag (injected here for .a==0)
 +grow = *
 +        ldx $4c         ;Get saved rand direction
 +        lda growinto,​x ​ ; 1-4 (4, 8, 1, 2) Get bit set for new cell
 +        sta ($d1),​y ​    ; write new location
 +        lda growfrom,​x ​ ; 1-4 (1, 2, 4, 8) Get bit to set for old
 +        sta $4d         ; Save growfrom bit
 +        ldy $45         ; Reload Current xpos
 +        ldx $46         ; Reload Current ypos
 +        jsr cgloadcell ​ ; Load base cell again
 +        ora $4d         ; Combine with growfrom bit
 +        sta ($d1),​y ​    ;​Modify old cell to connect to new loc
 +        ;Change current position
 +        lda $4c         ; Get saved rand direction
 +        jsr cangrow ​    ; Get new x y again - (this will only perform next x/y adj, returns <>0)
 +        sty $45         ; xpos   set to new location
 +        stx $46         ; ypos
 +        jmp popgridloop ; Return to populate grid loop
 +
 +
 +; Game Initialization and Game Loop
 +gameinit = *
 +gameloop=*
 +        inc EBCM3       ; Flash Keys ($d024)
 +        inc $58         ; Increase Speed counter #1 (0-255)
 +        bne moveplayer ​ ; Skip move
 +        inc $57         ; Increase Speed counter #2 ($57|#$f0 - 255)
 +        bne moveplayer ​ ; Skip Move
 +        ​
 +        ;set death speed
 +        lda $53         ;Use level for Speed value
 +        cmp #​%11111000 ​ ;If more than this use default speed
 +        bmi _dsp        ​
 +        lda #​%11111000 ​ ;Default speed
 +_dsp    ora #​%11110000 ​ ;Set high nibble so counter counts up to 255
 +        sta $57         ; Set Speed counter #2
 +
 +
 +;move death
 +movedeath = *
 +        ldy $55         ;​Baddy Xpos
 +        ldx $56         ;​Baddy Ypos        ​
 +        jsr cgloadcell ​ ; load the cell/point the zps (ANDs by #$1f)
 +        sta ($d1),​y ​    ;​store cleared value
 +_newy   ​iny ​            ;​increase xpos
 +        cpy #$28        ;less than 40?
 +        bmi _go         ;​don'​t reset
 +        ldx $46         ;ypos of player
 +        stx $56         ;ypos of death
 +        ldy #$00        ;clear xpos counter
 +_go     sty $55         ;Set baddy xpos
 +        lda #$ff        ;Get all bits! (see loadcell)
 +        jsr loadcell ​   ;load the cell/point the zps
 +        sta $59         ;Save cell value (withh all possible bits)
 +        and #​%11000000 ​ ;and by EBCM bits
 +        cmp #​%11000000 ​ ;Check for KEY - (so it can skip over)
 +        beq _newy       ;Jump ahead 1 more to skip key position
 +        cmp #​%01000000 ​ ;Check for player hit
 +        bne _nodie ​     ;Player is not dead
 +        jmp gameover ​   ;Game Over!  ​
 +_nodie ​ lda $59         ;​Reload stored value
 +        ora #$80        ;EBCM for Death
 +        sta ($d1),​y ​    ;​store value
 +; *** fall through to Move Player
 +
 +;Move Player
 +moveplayer=*
 +_ffe4   jsr $ffe4       ;Get keypress
 +        beq gameloop ​   ;no key - goto gameloop
 +        and #​%00000011 ​ ;.a == 0-7 at this point
 +        tax             
 +        lda keytodir,​x ​ ;Loads from keytodir
 +;Move entity in game
 +; .a=direction 1-up 2-right 3-down 4-left
 +glmove ​ tax
 +        stx $4e         ; store direction
 +        lda growinto,​x ​ ; get check bit
 +        sta $4f         ; store check bit
 +        ldy $45         ; current xpos
 +        ldx $46         ; current ypos
 +        jsr cgloadcell ​ ; load the cell (and with #$1f)
 +        sta ($d1),​y ​    ; store the data (clear the EBCM)
 +        lda #$00        ; Bottom "fall out" fix
 +        sta $50         ; clear and of cangrow ​ Bottom "fall out" fix
 +        lda $4e         ; load direction
 +        jsr cangrow ​    ; call cangrow to move xpos/ypos
 +        and $4f         ; check bit
 +        bne glmyes ​        ; if we have a bit then we can move!
 +        ldy $45         ; reload xpos - do not move
 +        ldx $46         ; reload ypos - do not move
 +glmyes ​ lda #$ff        ; bits to obtain from loadcell
 +        jsr loadcell ​   ; load the cell/point the zps
 +        pha             ; temp store value for later checks
 +        and #$1f        ; clear other EBCM bits 
 +        ora #$40        ; EBCM ORA Player/​Baddy
 +        sta ($d1),​y ​    ; store new data
 +        sty $45         ; store xpos of new position
 +        stx $46         ; store ypos of new position
 +
 +        ;Hit checks
 +        pla             ; load previous value
 +        and #$c0        ; check for hits "​11xxxxxx"​
 +        cmp #$c0        ; check for key hit
 +        bne _back       ;​_notkey ​    ; to next check
 +        dec $51         ; dec number of keys left in level
 +        bne _back       ; if 0 then we should go to the next level 
 +        jmp init        ; gen maze again
 +;_notkey cmp #$80        ; check for death hit!
 +;        bne _back 
 +;        jmp gameover ​      ; game over
 +_back   jmp gameloop
 +        ;more checks here?
 +
 +keytodir=*
 +.byte 2,1,4,3
 +
 +
 +;Check if a cell can grow a direction
 +;1-up 2-right 3-down 4-left
 +; (y xpos, x ypos, a=dir) ​ x/y switched for indirect lda($xx),y below
 +; return: a == 0 : true (can move)
 +;         a <> 0 : false (can not move)
 +cangrow = *
 +    cmp #$01
 +    beq cgup
 +    cmp #$02
 +    beq cgright
 +    cmp #$03
 +    beq cgdown
 +    ;cmp #$04
 +    ;beq cgleft *** not needed falling through
 +cgleft = *
 +    dey         ;set xpos - 1
 +    cpy #$ff    ;check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgright = *
 +    iny        ;set xpos + 1
 +    cpy #$28   ;​check xpos (<40)
 +    beq cgno
 +    bne cgloadcell
 +cgup = *
 +    dex        ;set ypos - 1
 +    cpx #$ff   ;​check xpos
 +    beq cgno
 +    bne cgloadcell
 +cgdown = *
 +    inx        ;set ypos + 1
 +    cpx #$19   ;​check ypos (<25)
 +    beq cgno
 +    ;*** fallthrough,​ bne cgloadcell not needed
 +cgloadcell = *
 +    lda #$1f
 +loadcell = *        ;x = ypos, y = xpos, a = and value
 +    sta $50
 +    jsr $e9f0   ; sets $d1 $d2
 +;​59888 ​        $E9F0
 +;Set Pointer to Screen Address of Start of Line
 +;This subroutine puts the address of the first byte of the screen line
 +;designated by the .X register into locations 209-210 ($D1-$D2).
 +    lda ($d1),​y ​    ;load byte  (x pos in y reg!)
 +cgno = *
 +    and $50  ;#$1f = use only low 5 bits!
 +    ;rts see below!
 +;  This is mixed with the rts because the first byte would
 +;  be wasted
 +growfrom = *
 +    rts
 +    .byte 1 
 +; this is part growfrom part growto ​ 48 is used for both
 +;  again first byte would be wasted so we over lap with the previous
 +growinto = *  ​
 +    .byte 2, 4, 8, 1, 2
 +
 +;​explanation of above
 +; 0  1  2  4  8
 +;       ​0 ​ 4  8  1  2 
 +;rts 1  2  4  8
 +;          4  8  1  2
 +
 +
 +
 +;Notes
 +
 +; 1 
 +;8 2
 +; 4 
 +;
 +;
 +;*** * * *** * *
 +;*0* *1* *2  *3   ​@ ​ A  B  C
 +;*** *** *** ***
 +;
 +;*** * * *** * *
 +;*4* *5* *6  *7   ​D ​ E  F  G
 +;* * * * * * * *
 +;
 +;*** * * *** * * 
 +; 8*  9*  a   ​b ​  ​H ​ I  J  K
 +;*** *** *** *** 
 +;
 +;*** * * *** * * 
 +; c*  d*  e   ​f ​  ​L ​ M  N  O
 +;* * * * * * * * 
 +
 +-- end: tinyrinth.asm ​ --
 +.......
 +....
 +..
 +.                                    C=H 20
 +
 +=====================================Part 6===================================
 +
 +
 +    ​Tetrattack!
 +
 +         by Stephen L. Judd <​sjudd@ffd2.com>​
 +
 +Introduction
 +------------
 +
 +The main point of Tetrattack!,​ of course, was to try and write a 3D engine
 +in 512 bytes. ​ I had a few other routine ideas but couldn'​t think of anything
 +remotely fun to do with them, but I finally thought of a neat 3D game idea
 +and decided to go for it.  Alas, as code kept getting bigger that idea had
 +to be discarded, and the eventual result was tetrattack, the simplest/​smallest
 +3D game that wasn't (totally) stupid.
 +
 +But it is a 3D engine in 512 bytes!
 +
 +At first blush a 512-byte 3D engine might seem ridiculous but think about it:
 +a basic 3d program doesn'​t need to store any graphics definitions or things
 +like that -- it draws its own as it goes.  In fact, all that is needed is a
 +line routine, a rotation/​projection routine, and some routines to keep track
 +of angles and positions and such.  What could possibly be hard about that,
 +right?
 +
 +Well, it seemed reasonable at the time, anyways. ​ At exactly 512 bytes, the
 +code really had to sweat out the bytes, and every single byte saved was
 +important; meanwhile, the routines still had to be reasonably fast.  So this
 +article will discuss the line routine, the rotation/​projection routine, and
 +the object and game manipulation routines, and the various optimizations used.
 +
 +I must say that writing this program really was fun; I'm sure that everyone
 +else had the same problems I had, trying to shift from a "​cycle-optimization"​
 +mindset to a "​byte-optimization"​ mindset. ​ Finding those extra few bytes
 +here and there was a mighty challenge, sometimes taking hours or even days
 +just to save one or two bytes. ​ That part was frustrating,​ of course, as
 +is the fact that the '​game'​ aspect of the program is thoroughly lame.  But
 +it was a neat challenge, and there was never an issue of adding too many
 +features to the program and never finishing it!
 +
 +
 +Quick Summary
 +-------------
 +
 +A full-on 3D program uses matrix multiplications to compute rotations, but
 +3D calculations get a whole lot easier in a Battlezone or driving-type
 +situation, when you only turn left or right. ​ There are no longer rotation
 +matrices to keep track of -- just a single angle. ​ This means that rotations
 +are done with a simple INC/DEC, and there are no matrix multiplies to deal
 +with.  Rotations are done with some simple table lookups.
 +
 +The line drawing routine is pretty standard. ​ Self-modifying code is
 +used to manipulate a single routine to draw different lines, by swapping
 +INX/INY instructions (which turns out to be less efficient, memory-wise --
 +sigh, oh well). ​ Lines are plotted to a double-buffered 128x128 charmap,
 +and once again setting up the screen took extra bytes.
 +
 +The "game code" is the simplest possible -- tetrahedrons are moved by
 +just incrementing their z-coordinate,​ and rotated by incrementing their
 +angle. ​ Their locations are determined by whatever zero page is initialized
 +to -- the program never sets explicit locations. ​ Instead, I just chose a
 +portion of zero page that had a reasonable range of default values (default
 +zero-page values are used extensively in the code)
 +
 +In addition to default zero-page values the code uses several kernal
 +routines (including a few unusual ones, for example the shift-C= routine
 +for swapping charset buffers), self-modifying code, lots of zero-page
 +instructions,​ and a whole bunch of tightly optimized code.
 +
 +The code was written over a few weeks -- a good week of planning and writing
 +out possible routines, and a good week or two of coding.
 +
 +A couple of times I froze the program (using AR) and the freeze almost always
 +happened in the same place: the buffer clearing routines. ​ So the rotation
 +and line drawing routines turned out to be fast enough, even with all the
 +byte savings.
 +
 +The code was developed using the Sirius assembler on a 128D+SCPU -- and btw,
 +the Jammon debugging mode has turned out to be awfully cool, with being able
 +to visually single-step through code while simultaneously viewing memory and
 +registers. ​ Action Replay was downright clunky afterwards (but could freeze!).
 +
 +Yep, just a plug, with a bit of amazement that it actually all worked.
 +
 +I also tested it with VICE, to make sure it worked OK.  VICE initializes
 +zero page to different values than my 128, unfortunately,​ so some tweaking
 +had to be done.
 +
 +And that's about it.  Now on to more detail.
 +
 +
 +----------
 +Setting up
 +----------
 +
 +The setup routines initialize VIC, initialize tables used by the line and
 +rotation routines, and set up the screen. ​ I note that when I started ​
 +planning this program I computed the sizes of line and rotation routines
 +without considering the code needed to set up tables and such for those
 +routines. ​ Oops!
 +
 +The default initial values for .X, .Y, and .A are 0 after a SYS, so the
 +code made a little use of this.
 +
 +Line routine setup
 +------------------
 +
 +The line routine uses tables to look up pixel addresses and bit locations
 +for each x,y coordinate (the line routine stores the x and y coordinates
 +directly in the .X and .Y registers, so looking up pixel addresses amounts
 +to LDA Bitp,X kinds of instructions). ​ The "​bitmap"​ is a 128x128 charmap,
 +so the calculations are pretty easy.  The bit patterns which correspond to
 +different x-coordinates (10000000 01000000 etc.) are computed using
 +
 + cmp #$80
 + rol
 +
 +to cyclically rotate a single bit.
 +
 +Rotation routine setup
 +----------------------
 +
 +Another little piece of code is piggybacked into this loop, which extends a
 +sine table from 0..pi. ​ In this program, "​angles"​ can go from 0..63,
 +corresponding to 0..2*pi; the most memory-efficient way to store the sine
 +table is to store the values for 0..pi/2, and then extend the table to
 +pi/​2..pi. ​ That is, angle=16 (angles go from 0..63, remember) corresponds to
 +pi/2, so the job of extending the table amounts to copying table values 15..0
 +to values 17..32 (value 16 is not copied because it is pi/2 -- the values
 +are '​mirrored'​ through pi/2.  To put it another way: where would it be copied
 +to?).
 +
 +This amounts to starting one index register at 15 and decrementing it, while
 +starting the other register at 17 and incrementing:​
 +
 + ldy #15
 + ldx #17
 + lda table,y
 + sta table,x
 + inx
 + dey
 +
 +The problem here is piggybacking into a loop where .Y is initialized to
 +ldy #$7F instead of ldy #15.  The solution is to initialize .X carefully
 +
 + [ldy #$7f]
 + ldx #$90
 +
 +and use the following:
 +
 +        lda sin0,y
 +        sta sin0+17,x
 + inx
 + [dey]
 +
 +The idea is that when .Y=15, .X will equal 0 and start incrementing. ​ This
 +copies a lot of junk into the table that is never used but doesn'​t overwrite
 +the existing 0..16 table values, and saves several bytes.
 +
 +The rotation tables are tables of r*sin(theta),​ one table for every allowed
 +value of theta=0..64 and r=-128..127. ​ The idea is that every 256-byte page
 +contains r*sin(theta) for one value of theta, so that a table lookup amounts
 +to
 +
 + lda theta
 + ora #>high byte of sin tables
 + sta zp+1
 + ldy r
 + lda (zp),y
 +
 +The table values are computed using a standard shift-and-add multiply routine,
 +with a little extra code to handle negative values of r.  What about
 +negative values of sin(theta), though?
 +
 +Recall that theta=0..63 corresponds to 0..2*pi, but the init code only
 +extends the sine table from 0..pi (i.e. theta=0..32). ​ The table values
 +for pi..2*pi are computed simply by taking the negative of the values
 +for 0..pi:
 +
 +         sta (p4000),y
 +         eor #$ff
 +         clc
 +         adc #1
 +         STA (p6000),y
 +
 +(The clc adc #1 code was painful, byte-wise, but necessary). ​ The pointers
 +p4000 and p6000 are initialized to $4000 and $6000 by default -- or at
 +least are supposed to be.  Unfortunately,​ VICE and my 128D had different
 +ideas about this, so a little extra code was necessary to initialize them.
 +
 +A big reason for putting a table at $6000 is that it ends at $8000, to
 +provide an easy exit condition for the loop:
 +
 +         inc p6000+1
 +         bpl :​mult8 ​      ;to $8000
 +
 +Screen setup
 +------------
 +
 +A couple of things need to be done here: clear the screen, set the color RAM
 +to the background color, and put a 16x16 block of characters in the middle
 +of the screen (and do so in as few bytes as possible!).
 +
 +Clearing the screen is a piece of cake:
 +
 +         sta $d021
 +         sta $0286        ;clear color
 +         jsr $e544        ;clr scr
 +
 +The $0286 STA is to clear the color RAM to black, and work with different
 +kernals. ​ (With both color RAM and background the same, pixels won't
 +appear where they aren't wanted).
 +
 +Storing a block of chars isn't quite so straightforward;​ rather, the
 +straightforward methods use up a fair number of bytes, especially since both
 +screen and color RAM have to be set up.  Here's the method I finally
 +came up with:
 +
 +:l2a
 +         jsr $e8ea        ;scroll up
 +                          ;.A = ($AC)
 +                          ;.X=00
 +         ​clc ​             ;necessary?
 +:l2                       ;set one line
 +         sta $0720+12,x
 +         inc $db20+12,x
 +         inx
 +         adc #16
 +         bcc :l2
 +                          ;         adc #$00
 +         inc $ac
 +
 +         cmp #15
 +         bne :l2a
 +
 +Kernal routine $e8ea is the routine to scroll the entire screen up one line,
 +and disassembly shows that when it exits .A is equal to the contents of $AC
 +and .X is 0.  So the routine stores one line of chars to the middle of the
 +screen, increments the corresponding color RAM (to white), and moves the
 +screen up (chars and color RAM).  By incrementing $AC before moving the
 +screen up ($AC starts at zero normally), .A is re-initialized to its previous
 +value+1, so this prints the next line of chars to the screen (that is, the
 +previous line of chars + 1).  By doing this 16 times a 16x16 block of chars
 +is produced, with color RAM set to white inside the block, and set to black
 +everywhere else.
 +
 +And that brings us to the main program loop.
 +
 +
 +---------
 +Main Loop
 +---------
 +
 +The main loop is pretty straightforward:​ clear buffer, get input, update
 +positions and angles, render the buffer, and swap buffers.
 +
 +The only thing special in this process is swapping buffers via the kernal
 +routine at $eb59 -- this is the routine that is called when C= and shift
 +are held down.
 +
 +There is also a lame dot that pulses to the screen when you hold the fire
 +button down (how cool some laser lines would have been). ​ Alas...
 +
 +But that's pretty much it.  So now let's talk a little about some of the
 +optimizations that can be made for a 3D program, followed by a discussion of
 +the line routine and the rotation routine.
 +
 +--------
 +3D Stuff
 +--------
 +
 +For detailed information on how to "​do"​ 3D worlds I suggest reading issue #16
 +of C=Hacking, but the basics of rendering a 3D world are:
 +
 +1) figuring out where an object is, relative to you,
 +2) if the object is visible, computing what it looks like from your
 +   ​perspective,​ and 
 +3) rendering the object.
 +
 +To define and operate on an "​object",​ we need to know the vertices of the
 +object, and how to connect them together with lines, i.e. to draw the edges.
 +The simplest object of all is a tetrahedron -- four points, and every point
 +is connected to every other point. ​ This turns out to be pretty important,
 +as a lot of memory is saved by using only tetrahedrons. ​ To see why, first
 +consider what the code to draw edges from a list of vertices must look like:
 +
 + rotate points
 +
 + ldx #​number_of_edges
 +:loop stx temp
 + ldy edge_vertex1,​x ;​index into some list of vertices
 + lda edge_vertex2,​x
 + tax
 + jsr drawline
 + ldx temp
 + dex
 + bpl :loop
 +
 +Now consider a cube; in my original code idea, I was going to use cubes
 +and tetrahedrons (pyramids). ​ A cube has eight vertices and 12 edges.
 +Each vertex has an x, y, and z coordinate, and from the code above each edge
 +needs two bytes, one for each vertex of the edge.  48 bytes, just to store a
 +cube -- almost 1/10 of the entire 512-byte code!  (Using code to generate the
 +vertices is possible, but still takes a fair number of bytes.)
 +
 +Compare this to the humble tetrahedron,​ which only requires twelve bytes:
 +three for each vertex. ​ What about the edges? ​ Remember that every point
 +is connected to every other point in a tetrahedron. ​ This means that we do
 +not have to store the edge connections,​ but can compute them manually, and
 +draw the whole thing using the following code:
 +
 + [count = 3]
 +                          ; Plot
 +:​ploop ​  ldy count
 +:l2      ldx count
 +         dey
 +         tya
 +         pha
 +         jsr DrawLine
 +         pla
 +         tay
 +         bne :l2
 +         dec count
 +         bne :ploop
 +
 +This code simply draws lines between all vertices: 3-2, 3-1, 3-0, and so on.
 +It uses pretty much the same number of bytes as the earlier code, but
 +doesn'​t require any storage of the edge connections (12 bytes to store
 +the connections).
 +
 +Next consider the vertices of a tetrahedron. ​ It turns out that by carefully
 +selecting the vertices, default zero-page values can be used, something
 +that isn't possible with e.g. a cube.  Consider the following tetrahedron
 +vertices:
 +
 +Px = 0, -4, 16, -4
 +Py = 16, 0,  8,  0
 +Pz = 0, 16,  0, -16
 +
 +(just the vertices of a more or less regular tetrahedron which I drew on
 +a piece of paper). ​ The trick is to arrange these coordinates in the right
 +way.
 +
 +Location $ae/$af is used by the load routine (among others) to store to
 +memory while loading; at the end of a load, this location contains the
 +end address+1. ​ And, it is surrounded by zeros:
 +
 +>​C:​00ab ​ 00 00 00 00  00 00 00 3c  03 00 00 00  00 00 00 00   ​.......<​........
 +
 +which means that by engineering the end address right we can store two
 +coordinates here.  In this case, I chose the end address+1 to be $10f0, which
 +stores $f0 and $10 in $ae/$af -- -16 and 16, two of the Pz values, the other
 +two being zero.
 +
 +This was my original plan anyways, until I re-discovered that the screen
 +setup routine altered the value of $ac (the kernal routine to move the screen
 +up exits with .A=$ac, so incrementing it re-initializes .A in every loop
 +iteration.) ​ On the other hand, it increments it 16 times, so that after the
 +load and the init the zp values are
 +
 +>C:00ab 00 10 00 f0  10 ..
 +
 +The values 00 10 00 f0 are exactly the Pz values above: 0, 16, 0, -16.
 +Careful study of default zero-page values then shows
 +
 +>​C:​0040 ​ 00 00 08 00  00 00 00 24  00 00 00 00  00 00 00 00   ​.......$........
 +
 +This is pretty much Py, except for the first value -- so all the code has
 +to do is store a 16 at location $40, and it turns out the screen init loop
 +ends with .X=$10. ​ So, one STX does the job, saving yet more bytes.
 +Moreover, by locating the vertices in zero page, zero-page instructions
 +can be used to access the point lists, saving even more bytes.
 +
 +Now, the above are the vertices, but what about the _centers_, i.e. the
 +object locations? ​ Once again, to save space I just used some default
 +zero-page values, with the x-coordinates at $14 and the z-coordinates at $7d.
 +The reason these were chosen is pretty straightforwards. ​ Since objects
 +just move '​downstream'​ with increasing z-coordinate,​ the z-coords should
 +be somewhat spread out; the values at $7d are:
 +
 +>​C:​007d ​ 3a b0 0a c9  20 f0 ef 38  e9 30 38 e9  d0 60 80 4f
 +
 +which are somewhat spread-out. ​ The x-coords, on the other hand, need to
 +be clustered around 0.  The camera (you) is located at x=0, and if objects
 +have a large value for the center x-coordinate they are way off to the side
 +(you'​ve probably noticed a few objects like this). ​ Location $14 has some
 +zeros and some nonzero numbers that are near zero, i.e. that cluster around
 +zero:
 +
 +>​C:​0014 ​ 00 00 19 16  00 0a 76 a3  00 00 00 00  00 00 76 a3
 +
 +Not great values, but good enough -- beggars can't be choosers.
 +
 +Of course, poking different values into locations 20 and above will put the
 +tetrahedrons in different places.
 +
 +
 +-----------------------
 +Rotation and Projection
 +-----------------------
 +
 +Rotations are done very simply, using tables. ​ As outlined earlier, there
 +is a complete set of tables of r*sin(theta). ​ At first blush, the '​natural'​
 +way of setting up these tables would be to have a table of 1*sin(theta) in
 +one page, a table of 2*sin(theta) in the next page, and so on, maybe fitting
 +a table of cos(theta) in the same page.  The location of r*cos(theta) would
 +then be given by
 +
 + page = offset+256*r (e.g. $40+r for tables starting at $4000)
 + value = page+theta
 +
 +It makes a lot more sense, though, to instead let _theta_ be the page index,
 +and let each page contain r*sin(theta) for different values of _r_; the lookup
 +is then
 +
 + page = offset+256*theta (e.g. $5300 for theta=$13)
 + value = page+r
 +
 +For one thing, the pointer setup is very easy -- there'​s only one value
 +of theta to set up, instead of different values of r (for different x,y
 +coordinates). ​ For another, the entire range r=-128..127 can be used,
 +which means the same tables can be used to rotate object _centers_ in
 +addition to object _vertices_.
 +
 +(If you recall programs like lib3d, the object centers are 16-bit signed
 +values and have their own rotation routine, whereas the object vertices
 +are limited to -96..95 and have a separate, optimized rotation routine; in
 +this case, both the vertices and centers are eight-bit signed numbers.)
 +
 +This all means that rotations (e.g. y*cos(theta) - x*sin(theta)) amount to
 +
 + LDY Py,X
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 +
 +to rotate a point.
 +
 +Now, the usual 3D world calculations look like:
 +
 + rotate object centers
 + rotate vertices, add to rotated center, and project
 +
 +Consider rotating the z-coordinate;​ the code will look like
 +
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 + CLC
 +RotM1 ADC CZ
 +
 +that is: rotate vertex (z*cos(theta) - x*sin(theta)) and add to center
 +(ADC CZ).  Here's the great part: if we instead want to rotate the _center_
 +of the object, we want to _store_ the rotated point in CZ; the code would
 +look like
 +
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 + STA CZ
 +
 +This is exactly the same code as above, except the ADC is changed to a STA.
 +Therefore only one routine is needed, using self-modifying code to decide
 +whether to STA the center coordinate or ADC it.  In the code, the appropriate
 +instruction is passed into the routine in the .Y register.
 +
 +There'​s one other thing to be mentioned about the rotation routine. ​ The
 +rotation tables are of r*sin(theta),​ and r*cos(theta) is of course done
 +by adding pi/2 to the theta-coordinate (theta+16 in the program). ​ There
 +are two ways to do this -- add 16 to the theta coordinate and correct
 +any overflow (AND #63, for example), or extend the tables by an extra pi/2.
 +
 +Byte-wise, the first method is the best, but check this out: instead of
 +adding pi/2, we can also subtract 3*pi/​2. ​ The setup code then becomes:
 +
 +RotProj ​
 +         AND #$3F
 +         ORA #>SINTAB
 +         STA sin+1
 +         SBC #$2F         ;​-3*pi/​4
 +                          ;         ADC #$10
 +                          ;         AND #$3F
 +         ORA #>SINTAB
 +         STA cos+1
 +
 +The trick works here because of the ORA #>​SINTAB,​ which is $4000 -- this
 +allows negative angles to work, by borrowing higher bits.  Consider:
 +
 +.A ORA #>​SINTAB SBC #$2F ORA #>SINTAB
 +-- ------------ -------- ------------
 +$12 $52 - sin tab at $5200 $22 $62 - cos tab at $6200
 +$22 $62 - sin tab at $6200 $32 $72 - cos tab at $7200
 +$32 $72 - sin tab at $7200 $42 $42 - cos tab at $4200
 +
 +(C is clear in the SBC above). ​ By using an SBC, angles automatically
 +wrap around from $8000 to $4000. ​ This saves two bytes over the ADC #$10
 +AND #$3F method (commented out above). ​ Every byte counts!!!
 +
 +Projections
 +-----------
 +
 +The easiest way to do projections,​ once again, is via tables of f(x,z)=x/z,
 +much like the tables of f(r,theta) = r*sin(theta). ​ Alas, the setup code
 +to generate these tables took up too much space, so the projects are
 +computed '​manually',​ using a shift-and-add division routine.
 +
 +Now, it turns out that the routine really computes 64*x/z -- the 64 is the
 +magnification factor. ​ The nice thing about a shift-and-add routine is that
 +multiplying by 64 is possible just by changing the number of shifts. ​ So
 +the routine adds an extra 6 shifts into the division loop -- a little
 +inefficient cycle-wise, but pretty thrifty byte-wise.
 +
 +
 +-------------
 +Drawing lines
 +-------------
 +
 +Obviously, to render an object we need a line drawing routine. ​ The line
 +routine in the program is pretty standard, and described in several earlier
 +issues of C=Hacking, so I won't go into too much detail. ​ Most of the
 +paragraphs below are just random tidbits, so feel free to skip any (there
 +won't be a test).
 +
 +The simplest and most efficient routines draw to a charmap -- you can read
 +about this in detail in C=Hacking #8 and beyond -- since the .Y register maps
 +directly to the y-coordinate,​ and the column-offsets are easy to compute.
 +The biggest charmap available is 128x128, so that's what I used.
 +
 +Overall, the line routine is pretty small, while being reasonably fast.
 +The plot routine is '​dumb',​ and recalculated for every pixel, but the
 +recalculation itself is fairly efficient, with a few table lookups. ​ It
 +draws into a 128x128 buffer and can handle off-screen coordinates,​ and
 +the buffer can be anywhere (that it, it supports a double-buffer display).
 +So, I think it turned out OK, all things considered.
 +
 +With that background, here's the main line drawing loop:
 +
 +plot
 +         pha
 +         tya
 +         bmi calcline
 +         lda bitphi,x
 +         ora base
 +         sta point+1
 +         lda bitplo,x
 +         sta point
 +         lda bitp,x
 +         beq calcline
 +         ora (point),y
 +         sta (point),y
 +calcline
 +         pla
 +mod2     sbc #dx
 +         bcs mod5
 +mod3     adc #dy
 +mod4     inx
 +mod5     iny
 +         dec temp
 +         bne plot
 +
 +
 +The basic line routine iteration goes like
 +
 + plot x,y
 + y=y+1
 + a = a - dx
 + if a<0 then a=a+dy, x=x+1
 +
 +for slope>​1. ​ The code for slope<1 is exactly the same, except for
 +swapping x and y, and dx and dy.  Therefore, I just used one routine and
 +self-modifying code to set INX and INY etc. in the right places (mod2, mod3,
 +mod4, and mod5 above). ​ In retrospect this was not smart, and I think it
 +would have saved 5 or 6 bytes to use two routines, and jsr to the plot
 +routine -- the routines to set up the self-modifying lines take more space
 +than the basic iteration.
 +
 +The x and y coordinates are stored directly in the .x and .y registers.
 +This makes the plotting really easy and relatively fast, and works well
 +with the 128x128 charmap.
 +
 +Off-screen points are a must, so the allowed coordinates are -64..192,
 +giving 64 pixels to either side of the 128x128 visible area.  Checking for
 +these is easy, since the high bit will be set for coords <0 and >127.
 +These are checked in the plot routine. ​ The y-coordinate is checked directly,
 +using tya; the x-coordinate is checked for range by use of the bitp table --
 +this table contains the bit pattern for the x-coordinate (%1000000 %01000000
 +etc.), and all of the values for x>127 are set to zero.  Once again, in
 +retrospect it would have been more memory-efficient to do this with txa (two
 +more bytes), but darn it all, SOMETHING has to be at least a little bit cycle
 +efficient here!  Old habits die hard!
 +
 +
 +The line routine always draws from left to right, which means the
 +initialization routine calculates dx = x2-x1, and if dx<0 then it swaps the
 +two endpoints. ​ This creates a problem when coordinates are -64..192 though;
 +after subtracting x2 and x1 do you check the negative bit, or the carry bit?
 +Cases like x1=-1, x2=1 should work, so the carry bit is useless; on the other
 +hand, long lines, say x1=1, x2=130, should also work, so the high bit is also
 +useless. ​ The solution I used was to offset coordinates by +64 (and once
 +again, in retrospect...).
 +
 +The point coordinates are stored in two lists, one of x-coords and one of
 +y-coords. ​ This means that the endpoints are simply indexes into those
 +point lists, i.e.:
 +
 + xlist contains list of x-coords; ylist list of y-coords
 + xlist,x and ylist,x contain left endpoint
 + xlist,y and ylist,y contain right endpoint
 +
 +That simplified a lot of things -- the lists can be in zero-page, swapping
 +coordinates is very simple, and so on.
 +
 +The routine is double-buffered -- flickery single-buffers are just too
 +painful. ​ Since $2000 and $2800 are the locations of the two charset buffers,
 +the current buffer is stored in zero-page variable $ce ("​base"​ in the
 +plot code above), which starts out initialized to #$20 (location $81 would
 +also work) -- one more variable to not initialize.
 +
 +So, in summary: a pretty standard line routine, with a few memory optimizations
 +thrown in.
 +
 +
 +-------
 +Wrap-up
 +-------
 +
 +Well, that's about it.  As you can see, the code really runs on the edge --
 +zero-page has to be initialized correctly, little bits and pieces are missing
 +from certain routines, things have to line up just right... but it works!
 +There are undoubtably other places where bytes can be saved, of course,
 +but I think it does a pretty good job of shaving bytes overall.
 +
 +See you at next year's contest!
 +
 +*      ​
 +* tetrattack!
 +*
 +* Mini-3D rendering package
 +*
 +* - Motion in the x-z plane => rotations about one axis only
 +* - Plots to charmap
 +*
 +* SLJ 9/01
 +*
 +
 +                          ;         org $1000
 +         org $0ef0
 +
 +NUMOBJS ​ = 9
 +
 +temp     = $02
 +dx       = $03
 +dy       = $04
 +point    = $05
 +
 +AUX      = $03
 +ACC      = $04
 +
 +CX       = $07
 +CZ       = $08
 +rottemp ​ = $09            ;0 initially
 +
 +zpoint ​  = $0a            ;low byte already set to zero
 +zpoint2 ​ = $0c
 +
 +count    = $0e
 +angle    = $0f
 +
 +cenx     = $14
 +
 +Py       = $40
 +
 +XCoord ​  = $50
 +YCoord ​  = $60
 +
 +theta    = $70
 +
 +cenz     = $7d
 +
 +                          ; $ac taken by screen setup routine
 +                          ; final value = $10
 +
 +Pz       = $ab
 +
 +
 +base     = $ce            ;$20 initially
 +                          ;alternate: ($81)
 +
 +p4000    = $8f            ;$4000 initially
 +p6000    = $89            ;$60xx initially
 +p8000    = $9c            ;$8000
 +
 +tagged ​  = $e0
 +
 +sin      = $f7            ;0 low byte
 +cos      = $f9
 +
 +curobj ​  = $ff
 +
 +
 +bitp     = $1c00
 +bitplo ​  = $1d00
 +bitphi ​  = $1e00
 +bitphi2 ​ = $1f00
 +SINTAB ​  = $4000
 +PROJTAB ​ = $8000
 +
 +
 +DEY      = $88
 +INY      = $c8
 +INX      = $e8
 +ADC      = $65
 +STA      = $85
 +
 +start    ​
 +         sta $8f          ;set up p4000
 +
 +*
 +* Set up VIC
 +*
 +         sta $d021
 +         sta $0286        ;clear color
 +         jsr $e544        ;clr scr
 +         lda #$18
 +         sta $d018
 +
 +
 +*
 +* Line drawing tables
 +*
 +
 +         ldx #$90
 +         ldy #$7f
 +         lda #$01
 +:l1      sta bitp,y
 +         cmp #$80
 +         rol
 +
 +         pha
 +         tya
 +         lsr
 +         lsr
 +         lsr
 +         lsr
 +         sta bitphi,y
 +         lda #00
 +         sta bitp+128,​y ​  ;0 = dont draw
 +         ror
 +         sta bitplo,y
 +
 +*
 +* Extend sin table to 0..31
 +*
 +         lda sin0,y
 +         sta sin0+17,x
 +                          ;sta sin0+32,x
 +         inx
 +
 +         pla
 +
 +         dey
 +         bpl :l1
 +                          ;.y=$ff
 +                          ;.x=$10
 +
 +*
 +* Set up sin tables
 +*
 +InitRot  ​
 +
 +         iny
 +         sty p6000        ;rats :(
 +
 +                          ;         ldy #00      ;rats :(
 +                          ;
 +                          ;         sty temp
 +                          ;:l3      ldx rottemp
 +                          ;         lda sin0,x
 +                          ;         sta aux      ;sin(theta)
 +                          ;
 +                          ;:l4
 +                          ;
 +                          ;         clc
 +                          ;         jsr MULT8    ;.Y * AUX
 +
 +                          ;
 +                          ; AUX*.Y -> ACC,.A (low,hi) 16-bit result
 +                          ; AUX, .Y unaffected
 +                          ; .Y can be negative
 +                          ;
 +
 +:​Mult8 ​  
 +         sty temp
 +
 +                          ;dumb multiply
 +                          ;         sty acc
 +                          ;         lda #00
 +                          ;         tay
 +                          ;:​mloop ​  clc
 +                          ;         adc sin0-$10,x
 +                          ;         bcc :skip
 +                          ;         iny
 +                          ;:​skip ​   dec acc
 +                          ;         bne :mloop
 +                          ;         tya
 +
 +         sty acc
 +         lda #00
 +         ldy #9
 +         clc
 +:​mloop ​  
 +         ror
 +         ror acc
 +         bcc :mul2
 +                          ;         clc
 +                          ;dec sin0 table by 1 instead
 +         adc sin0-$10,x
 +:mul2    dey
 +         bne :mloop
 +
 +         ldy temp
 +         bpl :pos
 +         sec
 +         sbc sin0-$10,x
 +:pos     
 +
 +         sta (p4000),y
 +         eor #$ff
 +         clc
 +         adc #1
 +
 +         STA (p6000),y
 +         iny
 +         bne :mult8
 +         inx
 +         inc p4000+1
 +         inc p6000+1
 +         bpl :​mult8 ​      ;to $8000
 +
 +                          ;.x=$20
 +                          ;.y=0
 +
 +*
 +* Set up Screen
 +*
 +
 +
 +                          ;         lda #00      ;bah!
 +                          ;         tya
 +:l2a     
 +                          ;         pha
 +         jsr $e8ea        ;scroll up
 +                          ;         pla
 +                          ;.A = ($AC)
 +                          ;.X=00
 +
 +         ​clc ​             ;necessary?
 +:l2                       ;set one line
 +         sta $0720+12,x
 +         inc $db20+12,x
 +         inx
 +         adc #16
 +         bcc :l2
 +                          ;         adc #$00
 +         inc $ac
 +
 +         cmp #15
 +         bne :l2a
 +:done                     ;​.X=$10
 +
 +         stx $40          ;rats :(
 +                          ;init Py
 +
 +*
 +* Main program loop:
 +*   - get input
 +*   - update positions/​angles
 +*   - render
 +*   - swap buffers
 +*
 +
 +MainLoop ​
 +
 +                          ; Clear buffer and swap
 +
 +         lda base
 +         eor #$08
 +         sta base
 +
 +         sta zpoint+1
 +         ldy #00
 +         tya
 +         ldx #8
 +:l0      sta (zpoint),y
 +         iny
 +         bne :l0
 +         inc zpoint+1
 +         dex
 +         bne :l0
 +
 +                          ; Get Input
 +                          ; .X=0
 +
 +         lda $dc00
 +         lsr
 +         lsr
 +         lsr
 +         bcs :c1
 +         inc theta
 +:c1      lsr
 +         bcs :c2
 +         dec theta
 +:c2      and #$01
 +         sta tagged
 +         eor #$01
 +         ora $23c0+6
 +         sta $23c0+6 ​     ;shot fired
 +
 +
 +
 +                          ;         lda $cb
 +                          ;         and #3
 +                          ;         beq :skip
 +                          ;         lsr
 +                          ;         bcc :inc
 +                          ;         dec theta
 +                          ;         lsr
 +                          ;        bcc :skip
 +                          ;         jsr Forwards
 +                          ;:inc     inc theta
 +                          ;:skip
 +
 +ObjLoop  ​
 +
 +                          ; Compute relative center
 +
 +         inx
 +         stx curobj
 +
 +         dec cenz,​x ​      ;​Update pos
 +
 +         lda cenx,x
 +                          ;         sec
 +                          ;         sbc cenx
 +         sta Px+4
 +         lda cenz,x
 +                          ;         sbc cenz
 +         sta Pz+4
 +
 +
 +                          ; Rotate
 +Rot      ​
 +
 +         lda theta,x
 +         adc theta
 +         sta angle
 +
 +         lda theta
 +         ldx #4
 +         ldy #STA
 +         jsr RotProj
 +         lda cz
 +         sbc #14
 +                          ;         bmi :skip
 +         cmp #125
 +         bcs NoDraw
 +         cmp cx
 +         bmi NoDraw
 +         adc cx
 +         bmi NoDraw
 +
 +                          ; If in view, rotate project & plot
 +
 +         dex
 +         stx count
 +:loop    lda angle
 +         ldy #ADC
 +         jsr RotProj
 +         sta YCoord,X
 +         dex
 +         bpl :loop
 +
 +                          ; Plot
 +
 +:​ploop ​  ldy count
 +:l2      ldx count
 +         dey
 +         tya
 +         pha
 +         jsr DrawLine
 +         pla
 +         tay
 +         bne :l2
 +         dec count
 +         bne :ploop
 +                          ;.y=0
 +NoDraw ​  
 +         ldx curobj
 +
 +                          ; Update
 +
 +         lda tagged,x
 +         beq :zip
 +         lda cx
 +         ora tagged
 +         sta tagged,x
 +         dfb $2c
 +:zip     inc theta,x
 +
 +
 +         cpx #NUMOBJS
 +         bcc ObjLoop
 +
 +         jsr $eb59        ;Swap buffer
 +                          ;.Y=1 .A=$7F
 +
 +         jmp MainLoop
 +
 +
 +*
 +* Line routine
 +*  - Draws from L to R
 +*  - Forces dx>0 (x1 < x2)
 +*  - Numbers are +128 offset, to allow
 +*  - DX and DY to be 0..255 for long lines
 +*  - Uses same core routine for stepinx
 +*    and stepiny by changing variables
 +*
 +* On input:
 +*  .X = index into point list, point 1
 +*  .Y = index of point 2
 +*
 +
 +Pswap    ​
 +         sty temp
 +         txa
 +         tay
 +         ldx temp
 +DrawLine ​
 +         lda xcoord,y
 +         sec
 +         sbc xcoord,x
 +         bcc Pswap        ;x2 >= x1
 +         sta dx
 +         lda ycoord,y
 +         sbc ycoord,​x ​    ;dy = y2-y1
 +         bcs :posy
 +         ldy #DEY
 +         eor #$ff
 +         adc #1
 +         dfb $2c
 +:posy    ldy #INY
 +         sta dy
 +         cmp dx
 +         lda #INX
 +                          ;         bcs stepiny
 +         bcs :noswap
 +         tya
 +         ldy #INX
 +:​noswap ​ sty mod5
 +         sta mod4
 +stepinx  ​
 +                          ;         sta mod5
 +                          ;         sty mod4     ;​INY/​DEY
 +         lda dy
 +         ldy dx
 +         bcc mod
 +stepiny  ​
 +                          ;         sta mod4
 +                          ;         sty mod5     ;​iny/​dey
 +         lda dx
 +         ldy dy
 +mod      ​
 +                          ;         sty mod1+1
 +         sty temp
 +         sty mod3+1
 +         sta mod2+1
 +
 +Draw     
 +         lda ycoord,x
 +         sec
 +         sbc #64
 +         tay
 +         lda xcoord,x
 +         sec
 +         sbc #64
 +         tax
 +                          ;mod1     lda #dy
 +         lda temp
 +         beq done         ;no steps!
 +                          ;         sta temp
 +         lsr
 +plot     
 +         pha
 +         tya
 +         bmi calcline
 +         lda bitphi,x
 +         ora base
 +         sta point+1
 +         lda bitplo,x
 +         sta point
 +         lda bitp,x
 +         beq calcline
 +         ora (point),y
 +         sta (point),y
 +calcline ​
 +         pla
 +mod2     sbc #dx
 +         bcs mod5
 +mod3     adc #dy
 +mod4     inx
 +mod5     iny
 +         dec temp
 +         bne plot
 +done     
 +         rts
 +
 +*
 +* Rotate and project point
 +*
 +* On entry:
 +*  .X = index into point list
 +*  .A = rotation angle
 +*  .Y = ADC/STA for points/​centers
 +*  CX,CZ = location of object
 +*
 +* On exit:
 +*  Points stored in xcoord,x ycoord,x
 +*
 +RotProj  ​
 +         AND #$3F
 +         ORA #>SINTAB
 +         STA sin+1
 +         SBC #$2F         ;​-3*pi/​4
 +                          ;         ADC #$10
 +                          ;         AND #$3F
 +         ORA #>SINTAB
 +         STA cos+1
 +         STY RotM1
 +         STY RotM2
 +RotProj2 ​
 +         LDY Pz,X
 +         LDA (sin),Y
 +         PHA
 +         LDA (cos),Y
 +         LDY Px,X
 +         SEC
 +         SBC (sin),Y
 +         CLC
 +RotM1    ADC CZ
 +         STA aux
 +
 +                          ;         LDY Px,X
 +                          ;         LDA (cos),Y
 +                          ;         LDY Pz,X
 +                          ;         CLC
 +                          ;         ADC (sin),Y
 +         PLA
 +         CLC
 +         ADC (cos),Y
 +         CLC
 +RotM2    ADC CX
 +         JSR Div8
 +         STA XCoord,X
 +
 +         LDA Py,X
 +                          ;         JSR Div8
 +                          ;         STA YCoord,X
 +                          ;         RTS
 +
 +*
 +* Signed division routine
 +*
 +* 64*.A/AUX -> .A
 +* Only valid for .Y/AUX < 1
 +* .A pos or neg, AUX > 0
 +* AUX, .X unaffected
 +*
 +
 +DIV8     
 +                          ;         STY ACC
 +                          ;         tya
 +         php
 +         bpl :pos         ;.A just loaded
 +         eor #$ff
 +:pos     sta acc
 +         LDA #0
 +         LDY #14
 +:​DLOOP ​  ASL ACC
 +
 +         ROL
 +         CMP AUX
 +         BCC :DIV2
 +         SBC AUX
 +         INC ACC
 +:DIV2    ​
 +         DEY
 +         BNE :DLOOP
 +                          ;RTS
 +
 +         lda acc
 +         plp
 +         bpl :pos2
 +         eor #$ff
 +:pos2    ​
 +         eor #$80         ;+128 offset
 +         rts
 +
 +*
 +* Point list
 +* Last point is for relative cx,cz
 +*
 +
 +                          ;Pz       dfb 0,​16,​0,​-16,​0
 +                          ;Py       dfb 16,0,8,0
 +Px       dfb 0,-4,16,-4
 +
 +                          ;note: px+4 intrudes on table below
 +
 +*
 +* 128*sin(t), t=0..15
 +*
 +* Table must be at end of code!
 +*
 +
 +                          ;sin0     dfb 0,​13,​25,​37,​49,​60,​71,​81
 +                          ;         dfb 91,​99,​106,​113
 +                          ;         dfb 118,​122,​126,​127,​128
 +
 +                          ;sin0     dfb 0,​25,​50,​74,​98,​120,​142,​162
 +                          ;         dfb 180,​197,​212,​225,​236,​244,​250,​254,​255
 +sin0     dfb 0,​24,​49,​73,​97,​119,​141,​161
 +         dfb 179,​196,​211,​224,​235,​243,​249,​253,​255
 +
 +
 +end      ​
 +
 +.......
 +....
 +..
 +.                                    - fin -
 +</​code>​
magazines/chacking21.txt · Last modified: 2015-04-17 04:34 (external edit)