User Tools

Site Tools


base:sprite_multiplexing
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


base:sprite_multiplexing [2015-04-17 04:34] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== Sprite multiplexing by Cadaver ======
 +
 +(This rant is mirrored from [[http://cadaver.homeftp.net/|Cadavers site]])
 +
 +This is about displaying more than 8 sprites on the screen (re-using, and thus
 +"multiplexing" the sprites)
 +
 +====== 0. Background ======
 +
 +Sprite multiplexing in its full extent (as seen in games like Green Beret or
 +Ghosts'n Goblins, which allow free usage of over 8 sprites anywhere on the
 +screen, any colors and frames) is a subject seldom discussed in magazines and
 +such. Often they just discuss re-using the sprites or the "zone split"
 +technique.
 +
 +When I started C64 development again, with crossdevelopment, I decided that if
 +I was going to do a game, it would definitely have to display more than 8
 +sprites. I tried to search the net for information but basically came up with
 +nothing, except some programmer diaries, for example Andrew Braybrook's
 +Morpheus diary, which was referring to some of the issues involved quite
 +cryptically.
 +
 +What I had to do was to start reverse-engineering the sprite sorting/
 +multiplexing routines of games like Green Beret, Midnight Resistance etc.
 +to understand deeper what was going on. My first sortroutine was of course
 +the bubblesort but that was slow...
 +
 +====== 1. "Zone split" technique ======
 +
 +Let's begin with this to introduce the hardware features involved within a
 +simpler approach. Consider the following situation in Turrican where the player
 +is battling against the first boss enemy, the giant "fist": There's the player
 +character composed of 3 sprites (2 multicolors sprites and a singlecolor white
 +sprite Y-expanded on top of that) and the "fist", composed of 5 sprites per
 +row and 4 rows (20 sprites total) Here's a diagram of the sprite numbers in
 +use on the screen, with the 3rd (the Y-expanded sprite) omitted.
 +<code>
 +               4 5 6 7 8
 +               4 5 6 7 8
 +               4 5 6 7 8
 +             4 5 6 7 8
 +          2
 + ______________________________
 +</code>
 +
 +So, the idea seems to be as follows: a raster interrupt happens before the
 +start of each sprite row for the boss enemy, and in it the Y-coordinates,
 +sprite frames, and colors for sprites 4-8 are rewritten, to re-use the sprites.
 +X-coordinates do not change between rows so they can be set beforehand.
 +
 +The most important hardware feature to take into account: The Y-coordinate for
 +a sprite must have been set before the raster line reaches that Y-position, or
 +the re-used sprite won't display at all!
 +
 +This is not as bad as it sounds. The Y-coordinate can actually be set well
 +beforehand, right after the previous sprite row has started displaying
 +(modifying the Y-coord in the middle of a sprite display won't "cut" it or
 +anything)
 +
 +In fact, more trouble will come from the frames and colors. It will take
 +at least one rasterline to rewrite the frame & color registers, so it's likely
 +that the last pixel line of the "previous" sprite row will already be
 +displaying the "new" sprite row's last pixel line graphics and colors, at least
 +for some sprites (we're too early) or the first pixel line of the "new" sprite
 +row still has the "previous" sprite row's graphics data & colors (we're too
 +late). So it's a matter of very careful timing adjustments, and possibly
 +designing the sprites so that it doesn't matter if frame & color register
 +writes are late (using only multicolors, not the sprite color, in the first
 +pixel line of sprites)
 +
 +An example code for handling the rewriting in above situation could look
 +something like this:
 +<code>
 + lda $d007 ;Take the Y-coord of the 4th sprite,
 + clc ;advance it 21 pixels lower and
 + adc #21 ;write to Y-coords of sprites 4-8.
 + sta $d007 ;This must happen before the raster
 + sta $d009 ;line reaches the new Y-coordinate,
 + sta $d00b ;or sprites won't be displayed.
 + sta $d00d
 + sta $d00f
 + ldx spriteindex
 + lda frametable,x ;Here, X has been loaded with an index
 + sta $c3fb ;into the boss enemy's sprite frame &
 + lda frametable+1,x ;color tables. Screen memory is at
 + sta $c3fc               ;$c000
 + lda frametable+2,x
 + sta $c3fd
 + lda frametable+3,x
 + sta $c3fe
 + lda frametable+4,x
 + sta $c3ff
 + lda colortable,x
 + sta $d02a
 + lda colortable+1,x
 + sta $d02b
 + lda colortable+2,x
 + sta $d02c
 + lda colortable+3,x
 + sta $d02d
 + lda colortable+4,x
 + sta $d02e
 + txa ;Increase the sprite index with 5
 + clc ;for the next sprite row.
 + adc #5
 + sta spriteindex
 +</code>
 +In fact, I've never written a this kind of multiplexer and probably never will.
 +Of course it allows impressively big enemy sprites but the freedom to do
 +whatever I want with the sprites is lost, and therefore I introduce next...
 +
 +====== 2. True (general) sprite multiplexing ======
 +
 +This methods allows to freely use sprites as if there were more than 8
 +available (Green Beret, Ghosts'n Goblins, Midnight Resistance, Turricans when
 +*not* in a bossfight and many more.) Of course, the hardware limits come into 
 +play: it is simply impossible to display more than 8 sprites on the same raster 
 +line, and because rewriting the sprite registers takes some time, ugly graphical 
 +effects can be displayed if sprites are packed tight and registers are actually 
 +being rewritten while the sprites are being displayed.
 +
 +The key to general multiplexing is to have the sprites in a list sorted from
 +top to bottom (using the Y-coordinate), so that the raster interrupts to handle
 +sprite re-use will always run from top to bottom in a controlled fashion.
 +
 +It is a bit more complicated than the zone-split multiplexing, but the
 +complexity will be hidden within the sprite sorting routine and the raster
 +interrupts that actually display the sprites: to the rest of the program it will
 +be as easy to access the sprites as if they were hardware sprites (or in fact
 +easier, because the sprite sorting routine can also take care of bit
 +manipulation like the X-coord most significant bits or sprite enabled bits)
 +
 +So, what needs to be done is to:
 +
 +  * Sort the sprites by increasing Y-coordinate, this means the sprites at the top of the screen will come first in the sorted order. (done in the sprite sorting routine)
 +
 +  * Map the "virtual" sprites onto the C64's 8 physical sprites (done in the sprite sorting routine or in the raster interrupts)
 +
 +  * Write the correct values to the sprite hardware registers so that the "virtual" sprites can be displayed as correctly as possible (done in the raster interrupts)
 +
 +  * When done displaying the sprites, reset their positions to the lowest possible position (255). This is to ensure that any sprites aren't displayed too early on the next frame, if the sprites are moving downwards. If writing the first 8 sprites well before the sprite displaying begins, this can be left out!
 +
 +Optionally, we can:
 +
 +  * Reject sprites that are outside visible X- or Y-coordinate range (this could also be responsibility of the main program)
 +
 +  * Reject sprites if there is more than 8 on a row (done in the sprite sorting routine or in the raster interrupts.) If this isn't done, some definitely ugly graphics will appear when more than 8 sprites try to coexist on a single rasterline.
 +
 +  * Figure out beforehand all the raster interrupts' positions and sprite numbers that will be (re)written in them.
 +
 +  * Precalculate values of the $d010 register to execute the raster interrupts quicker.
 +
 +  * Implement doublebuffering for the "sorted sprite table" used by raster interrupts. This allows to sort the sprites for the next frame to display while the previous sprites are still being displayed.
 +
 +===== 2.1 Sorting the sprites =====
 +
 +Please note: the sort algorithm of chapter 2.1.5 ("Ocean" algorithm) seems to 
 +be superior to all the others in this rant. I use it in MW4 and have been very
 +satisfied. Please consider using it for your actual C64 projects and think of
 +other algorithms being presented here only for educational purposes.
 +
 +I'll be mainly giving pseudocode (C-like) examples here. Yes, in some way this
 +is "cheating" in a C64 programming rant but I'm sure it gives a better
 +understanding of the algorithms than trying to follow an already optimized &
 +cryptic ASM implementation (and you get to see a lot of that in those game
 +source codes, links are at the end of this rant)
 +
 +So, what we start with is probably a random order of Y-coordinates in the
 +"virtual sprite" Y-coordinate array (whatever the main program is coming up
 +with). What we want as an end result is a nicely Y-sorted list of sprites to 
 +work with.
 +
 +Now it's also time to explain the array names used in these examples (they're
 +actually the same I use in my C64 source code.)
 +
 +The "virtual" sprite arrays (unsorted):
 +<code>
 +sprx - Unsorted X coordinates
 +spry - Unsorted Y coordinates
 +sprc - Unsorted colors
 +sprf - Unsorted sprite frame numbers
 +</code>
 +
 +The sorted sprite arrays:
 +<code>
 +sortsprx  - Sorted X coordinates
 +sortspry  - Sorted Y coordinates
 +sortsprc  - Sorted colors
 +sortsprf  - Sorted sprite frame numbers
 +sortorder - Sorted list of indexes to unsorted table (not used in every example)
 +</code>
 +
 +==== 2.1.1 The bubblesort ====
 +
 +This involves swapping the Y-coordinates until the order is right. It's fast
 +when there's only a small amount of sprites but when the amount grows, the time
 +taken grows proportional to the power of two (N^2) of the number of sprites N.
 +
 +Of course, it's not enough to swap the Y-coordinates. We must know what sprite
 +they're associated with, or they're worthless. Therefore we must have an index
 +array (indexes to the unsorted array) - an index entry travels with the
 +Y-coordinate during sorting.
 +
 +So we'd do like this
 +<code>
 +;Initialize arrays for sorting
 +
 +for (sprite=0; sprite < maximum_sprites; sprite++)
 +{
 +  sortspry[sprite] = spry[sprite];
 +  sortorder[sprite] = sprite;
 +}
 +
 +;Do the actual sorting
 +
 +for (outer = 0; outer < maximum_sprites-1; outer++)
 +{
 +  for (inner = outer+1; inner < maximum_sprites; inner++)
 +  {
 +    if (sortspry[inner] < sortspry[outer])
 +    {
 +      swap(sortspry[inner], sortspry[outer]);
 +      swap(sortorder[inner], sortorder[outer]);
 +    }
 +  }
 +}
 +</code>
 +
 +Another downside of this algorithm is that there is no clear point where to
 +insert more-than-8-sprites-on-a-row rejection; removing a sprite from the
 +sorted array would involve actually moving memory blocks around (slow).
 +
 +It could also be implemented so that there was no sorted Y coords table at this
 +point; instead accesses to the unsorted table would happen via the index table
 +at all times. This would remove the swapping of the Y-coords but make the
 +Y-coord compares slower.
 +
 +==== 2.1.2 Finding the next lowest Y-coordinate ====
 +
 +This doesn't involve swapping and does have a clear point where sprite
 +rejection can be inserted. It also lends itself well to completely unrolling
 +the inner compare loop for speed purposes. However, the execution time still
 +grows proportional to N^2.
 +
 +Here a "temporary" Y-coord array might need to be used, because values from
 +that table are actually being "destroyed". If all sprite coords are calculated
 +again for each next frame it doesn't matter. Anyway, I'm showing it with the
 +temp array involved. (it can be on the zeropage for speed)
 +
 +<code>
 +;Initialize temp array
 +
 +for (sprite = 0; sprite < maximum_sprites; sprite++)
 +{
 +  temp_y[sprite] = spry[sprite];
 +}
 +
 +;Do the sorting
 +
 +sorted_sprites = 0;   ;Counter of sprites that passed the sorting
 +
 +while (true)   ;Indefinite loop
 +{
 +  lowest_found_y = 255;      ;Maximum Y-coord value found so far
 +  lowest_spritenumber = 255; ;This is an illegal value used to
 +                             ;terminate the loop when no more
 +         ;sprites are found
 +
 +  for (sprite = 0; sprite < maximum_sprites; sprite++)
 +  {
 +    if (temp_y[sprite] < lowest_found_y)
 +    {
 +      lowest_found_y = temp_y[sprite];
 +      lowest_spritenumber = sprite;
 +    }
 +  }
 +
 +  if (lowest_spritenumber > maximum_sprites) break; ;Sorting finished,
 +     ;break out of the loop
 +  temp_y[lowest_spritenumber] = 255; ;Now make this sprite unavailable for the
 +                                    ;next round of sorting
 +
 +  if (sprite_not_rejected)      ;Here we can do sprite rejection if wanted
 +  {
 +    ;Next we can either copy the sprite variables to the sorted table,
 +    ;or just use an index
 +
 +    ;Copying to sorted table
 +    sortspry[sorted_sprites] = spry[lowest_spritenumber];
 +    sortsprx[sorted_sprites] = sprx[lowest_spritenumber];
 +    sortsprf[sorted_sprites] = sprf[lowest_spritenumber];
 +    sortsprc[sorted_sprites] = sprc[lowest_spritenumber];
 +    sorted_sprites++;
 +
 +    ;Using an index
 +    sortorder[sorted_sprites] = lowest_spritenumber;
 +    sorted_sprites++;
 +  }
 +}
 +</code>
 +
 +==== 2.1.3 Sorting according to Y divided by 8 (incorrect) ====
 +
 +This sort method doesn't guarantee correct order of sprites but is fast
 +(execution time proportional to N.) It's what I used in MW1-3, but won't be
 +using any longer - it's like taking only the latter step of a radix-sort.
 +
 +This is based on the idea of "buckets" where sprites are stored. Handling those
 +buckets with C64 ASM involves some creativity, but you can look at the sources
 +to see how I did it. Basically, my solution is to not use any two-dimensional
 +arrays, but finding out the amount of sprites in each bucket before actually
 +starting to put sprites into buckets; therefore each bucket's contents can be
 +next to each other in memory and the sorted sprite order emerges then directly
 +from the bucket memory area, without having to actually walk through each
 +separate bucket like is done in this example.
 +
 +If this sounds complicated, don't worry, this isn't a good solution. We'll come
 +to the best solution (in my opinion) last...
 +
 +<code>
 +;Reset the amount of sprites in each bucket to 0. There has to be as many
 +;buckets as there are character rows in the visible sprite range.
 +
 +for (bucket = 0; bucket < maximum_buckets; bucket++)
 +{
 +  amount_in_bucket[bucket] = 0;
 +}
 +
 +;Store sprites to buckets, according to their Y coordinates. The
 +;buckets are actually a two-dimensional array.
 +
 +for (sprite = 0; sprite <  maximum_sprites; sprite++)
 +{
 +  bucket_number = spry[sprite] / 8;
 +  bucket[bucket_number][amount_in_bucket[bucket_number]] = sprite;
 +  amount_in_bucket[bucket_number]++;
 +}
 +
 +;Walk through all buckets to get the sprite order. In this example,
 +;the sprites are directly copied to the sorted table.
 +
 +sorted_sprites = 0;
 +
 +for (bucket = 0; bucket < maximum_buckets; bucket++)
 +{
 +  for (index = 0; index < amount_in_bucket[bucket]; index++)
 +  {
 +    sprite = bucket[bucket][index];
 +    sortspry[sorted_sprites] = spry[sprite];
 +    sortsprx[sorted_sprites] = sprx[sprite];
 +    sortsprf[sorted_sprites] = sprf[sprite];
 +    sortsprc[sorted_sprites] = sprc[sprite];
 +    sorted_sprites++;
 +  }
 +}
 +</code>
 +
 +==== 2.1.4 Radix sorting ====
 +
 +This ensures correct sprite order, but is complicated. Execution time, like in
 +previous example, is only proportional to N but the code itself can be
 +relatively slow. It follows the "bucket" idea but in two passes. First a
 +sorting is done according to the remainder of Y coordinate divided by 16, then
 +according to the actual result of the division.
 +
 +<code>
 +; Clear buckets for both passes.
 +
 +for (bucket = 0; bucket < maximum_buckets; bucket++)
 +{
 +  amount_in_bucket1[bucket] = 0;
 +  amount_in_bucket2[bucket] = 0;
 +}
 +
 +;Store sprites to the first pass buckets, according to remainder of
 +;Y divided by 16.
 +
 +for (sprite = 0; sprite < maximum_buckets; sprite++)
 +{
 +  bucket1_number = spry[sprite] & 15;
 +  bucket1[bucket1_number][amount_in_bucket1[bucket1_number]] = sprite;
 +  amount_in_bucket1[bucket1_number]++;
 +}
 +
 +;Walk through all the first pass buckets to get the first pass sorting order.
 +;At the same time, put sprites to the second pass buckets according to Y
 +;divided by 16.
 +
 +for (bucket = 0; bucket < maximum_buckets; bucket++)
 +{
 +  for (index = 0; index <  amount_in_bucket1[bucket]; index++)
 +  {
 +    sprite = bucket[bucket][index];
 +    bucket2_number = spry[sprite] / 16;
 +    bucket2[bucket2_number][amount_in_bucket2[bucket2_number]] = sprite;
 +    amount_in_bucket2[bucket2_number]++;
 +  }
 +}
 +
 +;Walk through the second pass buckets to get the final sprite order.
 +;In this example, the sprites are directly copied to the sorted table.
 +
 +sorted_sprites = 0
 +
 +for (bucket = 0; bucket < maximum_buckets; bucket++)
 +{
 +  for (index = 0; index < amount_in_bucket2[bucket]; index++)
 +  {
 +    sprite = bucket2[bucket][index];
 +    sortspry[sorted_sprites] = spry[sprite];
 +    sortsprx[sorted_sprites] = sprx[sprite];
 +    sortsprf[sorted_sprites] = sprf[sprite];
 +    sortsprc[sorted_sprites] = sprc[sprite];
 +    sorted_sprites++;
 +  }
 +}
 +</code>
 +
 +As you can see, this is complicated and involves quite a lot of memory,
 +as well as at least three loops to get everything done.
 +
 +
 +==== 2.1.5 The "Ocean" sorting ====
 +
 +Named this way because it can be found from many Ocean/Imagine games, like 
 +Green Beret or Midnight Resistance. A similar algorithm is also in Dragon
 +Breed.
 +
 +This sort routine is different from all previously mentioned. It uses an order-
 +array that is not resetted each time, therefore each sorting operation is a
 +continuation of the previous and if not much changes have happened (sprites not
 +moving past each other in Y-direction much) it will be very fast as it doesn't
 +have to do almost anything. However, there's the possibility that on some frame
 +it'll eat a lot of time, when there's a lot of changes in the sprite order.
 +
 +First, performed only once in the beginning of the program, we need to set an
 +initial state for the sort order array. One choice is all the sprite numbers
 +going from 0 to maximum:
 +
 +<code>
 +for (sprite = 0; sprite < maximum_sprites; sprite++)
 +{
 +  sortorder[sprite] = sprite;
 +}
 +</code>
 +
 +Then the sorting routine. Note one curious thing: What if the amount of sprites
 +onscreen changes, how can the sortroutine deal with it? Obviously, it can't,
 +directly. Therefore we must use for example the maximum Y-coordinate value 255
 +to mark unused sprites; these will fall to the bottom of the sorted list when
 +sorted and cause no trouble (the actual sprite display code can then easily
 +notice the first unused sprite and exit)
 +
 +<code>
 +sprite1 = 0;
 +
 +while (true)
 +{
 +  if (spry[sortorder[sprite1+1]] < spry[sortorder[sprite1]])
 +  {
 +    sprite2 = sprite1;
 +
 +    while (true)
 +    {
 +      swap(sortorder[sprite2], sortorder[sprite2+1]);
 +      if (sprite2 == 0) break;
 +      sprite2--;
 +      if (spry[sortorder[sprite2+1]] >= spry[sortorder[sprite2]]) break;
 +    }
 +  }
 +  sprite1++;
 +  if (sprite1 == maximum_sprites - 1) break;
 +}
 +</code>
 +
 +Here's the ASM implementation:
 +
 +<code>
 + ldx #$00
 +sortloop:  ldy sortorder+1,x
 +         lda spry,y
 +                ldy sortorder,x
 +                cmp spry,y
 +                bcs sortskip
 +                stx sortreload+1
 +sortswap:  lda sortorder+1,x
 +                sta sortorder,x
 +                sty sortorder+1,x
 +                cpx #$00
 +                beq sortreload
 +                dex
 +                ldy sortorder+1,x
 +                lda spry,y
 +                ldy sortorder,x
 +                cmp spry,y
 +                bcc sortswap
 +sortreload: ldx #$00
 +sortskip:  inx
 +                cpx #MAXSPR-1
 +                bcc sortloop
 +</code>
 +
 +So, the abovementioned sorting loop produces a sorted index array of the
 +sprites. This array can now be walked through and sprites be copied to the
 +sortsprx, sortspry etc. arrays and rejected if necessary. I really feel this is
 +the overall best algorithm (for game use, at least) I've got to know so far.
 +I'm definitely going to use this in the future.
 +
 +===== 2.2 Mapping the "virtual" sprites to physical sprites =====
 +
 +There really isn't much of a choice, the idea is to go round and round with the
 +physical sprites until all "virtual" sprites have been displayed. So, the
 +1st-8th virtual sprites (in sorted order) will go to physical sprites 1-8, then
 +virtual sprite 9 will again be mapped onto physical sprite 1, virtual sprite 10
 +to physical sprite 2 and so on.
 +
 +Based on the priority you want for the sprites you can also reverse the
 +physical sprite order, so 1st virtual goes to 8th physical, 2nd virtual to 7th
 +physical etc.
 +
 +This is an example of sprite mapping, something that could be happening in
 +Green Beret (first level.) There's the 2-sprite soldiers running around, the
 +player jumping in the air and a flamethrower left hanging in the air (1 sprite)
 +<code>
 +                       1
 +             4               3
 +           __5_____         ________
 +          /       6\       /     7
 +    _____/________8_\_____/______1__
 +   /                      3
 + /____________4____________5_______
 +</code>
 +BOFH:Servers Under Siege has a priority-based sprite mapping instead. This
 +is more complicated: if the sprite's priority class is LOW (bodies,
 +items) the sprite mapping loop tries to find a "free" physical sprite in this
 +order: 8,7,6,5,4,3,2,1. For MEDIUM priority (player, enemies) the order is
 +5,4,6,3,7,2,8,1 and for HIGH priority (fire, explosions) the order is
 +1,2,3,4,5,6,7,8. "Free" physical sprite means in this case one whose previous
 +virtual sprite is sufficiently far away from the new virtual sprite, in Y-
 +direction (practically, at least 21 pixels)
 +
 +===== 2.3 Rejecting sprites that try to be the 9th sprite on a row. =====
 +
 +This is quite easy. Let's assume we're walking through the order array of
 +sprites, like the one produced by the algorithm presented just above. The
 +virtual sprites are to be mapped to physical sprites by the round-and-round
 +method, next_sprite is the next sprite-index in sorted order, and there's a
 +variable sorted_sprites (like before) that indicates how many sprites have
 +been "accepted" so far. The accepted sprites so far have been copied to the 
 +sorted arrays (sortsprx, sortspry etc.)
 +
 +The test is:
 +<code>
 +if (spry[next_sprite] - sortspry[sorted_sprites - 8] < 21) then reject();
 +</code>
 +
 +You'll see that this test cannot be done for the first 8 sprites because the
 +sorted array index would turn negative. That is logical, the test is
 +unnecessary for the first 8 sprites and they can always be accepted. But the
 +idea for further sprites is to compare the Y-coordinate of the previous
 +"virtual" sprite mapped to that physical sprite; if the difference is less
 +than 21 pixels then the sprite would be too close and couldn't be displayed by
 +the hardware.
 +
 +===== 2.4 Determining what physical sprites to write, and when. =====
 +
 +This can be done either "on the fly" in the raster interrupt or be 
 +precalculated. There are two basic methods, see which one fits you better.
 +
 +==== 2.4.1 Just before the new sprite ====
 +
 +The idea is to wait until we are close enough to the starting Y-coordinate of a 
 +sprite, and then write the sprite registers. If the following sprite is close 
 +to it, it can also be written in the same raster interrupt.
 +
 +What is "close enough"? In fact, it is the amount of rastertime taken to write
 +all 8 sprites under a worst-case condition. If we come closer to a sprite's
 +Y-coordinate before writing the registers, there could be the possibility that
 +we don't have time to write all 8 sprites, if all of them would happen to be
 +at the same Y-coordinate. But the closer we can come, the neater tight-packed
 +sprite formations will look like (less graphical errors.)
 +
 +If the raster interrupts (when they happen, from what sprite to start, how many
 +sprites to write) are precalculated, there are two benefits:
 +
 +  * The "close enough"-factor (adjustment to raster interrupt position) can be decided based on the number of sprites that will be written in the next raster interrupt. For example, if there's only one sprite, the interrupt can happen just for example 2 lines before its Y-coordinate, but if there's all sprites, the interrupt must happen about 10-12 lines before.
 +
 +  * Y-coordinates are critical, they must be ready when the sprites start to display, everything else is cosmetic. So we can write the Y-coordinates first in the raster interrupt, because we know what sprites to write, and only after that the rest (X-coords, frame, color)
 +
 +==== 2.4.2 Just after the old sprite ====
 +
 +For this method, the first 8 sprites have to be written at the top of the
 +frame. Then, we wait for the rasterline just below the first sprite, and write
 +the new sprite register values to re-use it. Repeat for the second sprite and
 +on until the last virtual sprite's register values have been written.
 +
 +If we write the Y-coordinate last, we will actually ensure that no glitches
 +are possible: in the case the new sprite is too close to the old or writes are
 +otherwise delayed, the new sprite simply won't be displayed at all!
 +
 +This method is simpler & cleaner, but in games where close sprite packing is
 +inevitable (like Green Beret), missing sprites to avoid glitches might not be
 +acceptable.
 +
 +===== 2.5 Doublebuffering =====
 +
 +This too isn't too complicated. Reserve twice as much space for the sorted
 +sprite arrays (for example, if you use 24 sprites, reserve space for 48). When
 +the raster interrupts are displaying sprites 0-23, you can modify sprites 24-47
 +and vice versa. This makes the code a bit more complicated but it isn't a
 +great pain overall.
 +
 +To keep the main program simple, the unsorted sprite arrays shouldn't be
 +doublebuffered.
 +
 +===== 2.6 The raster interrupt code =====
 +
 +The aim is simple: to write the needed sprite register values as fast as
 +possible. Many commercial games are quite inefficient in this, consider the
 +following, which tries to be quite much like the typical code in Ocean
 +games: (here, we assume Y contains the "virtual" spritenumber)
 +
 +<code>
 + tya
 + and #$07 ;Get physical spritenum
 + tax ;to X
 + lda sortsprc,y
 + sta $d027,x
 + lda sortsprf,y
 + sta $c3f8,x ;Write frame to both screens of the
 + sta $c7f8,x ;doublebuffered screen (unnecessary!)
 + txa
 + asl
 + tax
 + lda sortspry,y
 + sta $d001,x
 + lda sortsprx,y ;Multiply sprite X-coord by 2; Ocean
 + asl ;games are really fond of this, I
 + sta $d000,x ;prefer 1-pixel accuracy
 + bcc msb_low
 +msb_high: lda $d010
 + ora ortbl,x
 + sta $d010
 + jmp nextsprite
 +msb_low: lda $d010
 + and andtbl,x
 + sta $d010
 +nextsprite: iny
 + ...
 +
 +ortbl: dc.b 1
 +andtbl: dc.b 255-1
 + dc.b 2
 + dc.b 255-2
 + dc.b 4
 + dc.b 255-4
 + dc.b 8
 + dc.b 255-8
 + dc.b 16
 + dc.b 255-16
 + dc.b 32
 + dc.b 255-32
 + dc.b 64
 + dc.b 255-64
 + dc.b 128
 + dc.b 255-128
 +</code>
 +What can be done to optimize this process?
 +
 +  * Write a separate piece of code for each physical sprite
 +  * Precalculate $d010 values (of course, time doing this is lost elsewhere, but faster raster interrupt code gives less significant graphical errors with tight sprite formations)
 +  * Write frame to only one screen at a time; the raster interrupt code can be modified according to the screen currently in use
 +
 +Now consider this second example (again, virtual spritenumber is in Y)
 +
 +<code>
 +sprite0: lda sortspry,y
 + sta $d001
 + lda sortsprx,y
 + sta $d000
 + lda sortsprd010,y
 + sta $d010
 + lda sortsprf,y
 +frame0: sta $c3f8
 + lda sortsprc,y
 + sta $d027
 + iny
 + cpy endspr
 + bcs done
 +
 +sprite1: lda sortspry,y
 + sta $d003
 + lda sortsprx,y
 + sta $d002
 + lda sortsprd00,y
 + sta $d010
 + lda sortsprf,y
 +frame1: sta $c3f9
 + lda sortsprc,y
 + sta $d028
 + iny
 + cpy endspr
 + bcs done
 + ...
 +</code>
 +
 +Looks a bit faster, doesn't it? We would modify the highbyte of the frame-
 +write STA instructions according to what side of the doublebuffered screen is
 +in use. Of course we also have to jump to the correct place in this code
 +according to what sprite will be written first, so using this approach isn't
 +that simple...
 +
 +===== 2.7 Eliminating flicker in raster interrupts =====
 + 
 +Sprite multiplexing in games is quite an unpredictable operation, because the
 +sprites can be anywhere. If one just blindly follows the sorted Y-coordinates
 +and fires up raster interrupts according to them, flicker & slowdown (caused
 +by the new raster interrupt being higher or at same position than the current
 +$d012 value) may occur in at least these cases:
 +
 +  * The next raster interrupt is so close to the current that we are already late from it, according to $d012
 +
 +  * The bottom scorepanel raster interrupt (ok, this is not in all programs, but in quite many) has been missed because of sprite multiplexing.
 +
 +The first thing to do is to make sure that sprites outside the visible screen
 +aren't displayed. Also it's a good idea to remove a couple extra lines from the
 +bottom of the screen: if scorepanel begins at line 200 then don't allow for
 +example sprite Y-coordinates 198 & 199 to be displayed.
 +
 +The second is to simply introduce this checking code at the end of each raster
 +interrupt, whose location/duration we're unsure of. In this example the
 +accumulator contains the position of the next raster interrupt.
 +
 +<code>
 +                sta $d012
 +                sec
 +                sbc #$03
 +                cmp $d012                       ;Late from next IRQ?
 +                bcc go_to_irq_directly
 +</code>
 +Note that the go_to_irq_directly should skip all register saving/restoring, as
 +well as interrupt acknowledging (obvious.) 3 lines feel like paranoidically
 +much "safety" for the interrupt but in fact I got flicker in some rare cases
 +if I was subtracting only 2.
 +
 +These two things won't prevent game slowdown when the CPU simply has too much
 +to do and they won't prevent graphical errors when there's simply too many
 +sprites in one place, but at least they prevent the dreaded flickering and
 +music slowdown. :)
 +
 +====== 3. Conclusion ======
 +
 +To see actual whole multiplexers written in ASM, I recommend the following:
 +
 +[[http://covertbitops.c64.org/rants/multi.zip|An example general multiplexer]]
 +("Ocean" sorting, now with $d010 handling)
 +
 +In these following games, the source code files concerning sprite multiplexing
 +are sprite.s and raster.s.
 +
 +[[http://covertbitops.c64.org/games/mw3src.zip|Metal Warrior 3 source code]]
 +(doublebuffering, inaccurate "Y divided by 8" sorting, precalculation of raster
 +interrupts)
 +
 +[[http://covertbitops.c64.org/rants/freedir.zip|The Freedirectional Scrolling test program]]
 +(doublebuffering, radix sorting of Y-coords)
 +
 +[[http://covertbitops.c64.org/games/bofhsrc.zip|BOFH source code]]
 +(doublebuffering, radix sorting of Y-coords, priority mechanism to make
 +standing characters display over, not beneath, bodies & items for example)
 +
 +Also, you can try reverse-engineering your favorite game that you know to use
 +more than 8 sprites! In an ML monitor, you can search for bytes $01 $d0
 +(writes to the first sprite's Y coordinates), or to other sprite registers,
 +or just look around the code at the IRQ address, usually there's multiplexing
 +code nearby.
 +
 +When you look at the indexed accesses made in the sprite display interrupt code,
 +you'll usually notice one memory access whose result doesn't go into any sprite
 +I/O register, but is used as an index (either in X or Y register) for accesses
 +into other sprite arrays. This is likely the sprite order array, and by looking
 +for memory references to it you'll find the sortroutine.
 +
 +Good luck in your multiplexer experiments! Remember, when you get your
 +multiplexer running, stress-test it with abnormal amounts of sprites...
 +
 +Lasse Öörni - loorni@student.oulu.fi
 +
  
base/sprite_multiplexing.txt · Last modified: 2015-04-17 04:34 by 127.0.0.1