base:rant11
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | base:rant11 [2015-04-17 04:33] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Significant tricks & techniques in MW4 by Cadaver ====== | ||
+ | (This rant is mirrored from [[http:// | ||
+ | |||
+ | This rant details some of the more out of the ordinary techniques used in | ||
+ | the finished version of MW4 that make it possible to be what it is. None of | ||
+ | them is anything special, but they' | ||
+ | games, so I thought to write something of them.. | ||
+ | |||
+ | Please refer to the MW4 source code to see what I'm talking about, | ||
+ | [[http:// | ||
+ | |||
+ | ====== 0. Free/ | ||
+ | |||
+ | (Also discussed in Rant #4) This is like the block scrolling in any of my | ||
+ | games, in that all action still happens in 2 routines: | ||
+ | |||
+ | * SCROLLLOGIC, | ||
+ | |||
+ | * SCROLLWORK, hard work of scrolling (screen memory/ | ||
+ | |||
+ | SCROLLLOGIC is still called at the beginning of the frame, so that the sprite | ||
+ | subtract is correctly updated and we know what the scrolling values are going | ||
+ | to be for the rest of frame. | ||
+ | |||
+ | SCROLLWORK is called in the end of the frame, respectively. | ||
+ | |||
+ | The difference to " | ||
+ | Usually scrollers pick one of the 8 directions, and advance the scrolling for a | ||
+ | certain amount of frames until a distance of 1 char has been scrolled (Turrican | ||
+ | for example). | ||
+ | |||
+ | But here the algorithm doesn' | ||
+ | the current finescroll values (scrollx, scrolly) and the current scrolling | ||
+ | speed (scrollsx, scrollsy). Btw. those values have 3 bits of subpixel accuracy, | ||
+ | so they go from 0-63 instead of 0-7. The scrolling speed is a signed number, so | ||
+ | it goes from +32 to -32 (speeds higher than 4 pixels/ | ||
+ | screen shifting can happen on each second frame at most). | ||
+ | |||
+ | The first step is to subtract the speed from the finescroll values, separately | ||
+ | for X and Y axes. If the finescroll values overflow, we don't care of it yet, | ||
+ | instead we just clamp them at the ends of the finescroll range (0 or 63). This | ||
+ | is because we are not yet ready to perform a complete screen shifting in the | ||
+ | space of one frame! | ||
+ | |||
+ | Now we have the finescroll values for the next frame! | ||
+ | |||
+ | The next step is to subtract the speed from finescroll yet again. This time we | ||
+ | check overflow in either positive or negative direction, and by doing this for | ||
+ | both axes we know where the screen needs to be shifted. If the finescroll *does | ||
+ | not* overflow, we can discard the values from this second subtraction, | ||
+ | shifting is necessary, and the process starts over on the next frame. | ||
+ | |||
+ | But if we get an overflow, this tells that the screen has to be shifted. Now | ||
+ | the rest of the process goes like this: | ||
+ | |||
+ | * A flag is set so that next time SCROLLLOGIC is executed, it just takes into use the precalculated finescrollvalues that we got from the second subtraction, | ||
+ | |||
+ | * At the end of the frame, SCROLLWORK will shift the screen memory. As double buffering is used, it's always a copy from the currently visible screen to the currently hidden screen, with an offset that is determined by the shifting direction. After the shift, new data is drawn to the screen edges. | ||
+ | |||
+ | * At the end of the next frame, SCROLLWORK will shift the color memory. This requires checking that the rasterbeam ($d012) is at a suitable position (at the scorescreen, | ||
+ | the screen. | ||
+ | |||
+ | The file " | ||
+ | |||
+ | ====== 1. Realtime depacked sprites ====== | ||
+ | |||
+ | This is also something I've written about before, but how it's used in the " | ||
+ | MW4 engine is a bit different. | ||
+ | |||
+ | All main characters and enemies (about 160 frames loaded at once) still reside | ||
+ | in the videobank memory: these are not unpacked in real time, as it would be | ||
+ | quite impossible to achieve 50Hz frame rate while doing that. | ||
+ | |||
+ | But the weapon carried by each character/ | ||
+ | items lying on the ground, and " | ||
+ | packed sprites. There' | ||
+ | also the max. number of actors active at once, so each actor can utilize 1 | ||
+ | packed sprite. | ||
+ | |||
+ | Packed sprites are divided into 6 " | ||
+ | (each string of four numbers represents four multicolor pixels - one byte) | ||
+ | < | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | 1111 2222 3333 | ||
+ | |||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | 4444 5555 6666 | ||
+ | </ | ||
+ | |||
+ | As you see, the lowest 7 rows of a packed sprite aren't used at all, to make | ||
+ | the depacking process quicker (the lowest 7 rows have been cleared beforehand, | ||
+ | when the program starts - this needs to be done only once). All packed sprites | ||
+ | are so small that they fit into the 6 slices. | ||
+ | |||
+ | For each slice, one bit determines whether it's empty (0) or whether it | ||
+ | contains 7 bytes of data (1). Furthermore, | ||
+ | facing right. If they need to be displayed flipped, the flipping is also | ||
+ | performed realtime in the program, | ||
+ | |||
+ | One more thing, before I present the actual depack routines from " | ||
+ | The depacked sprites are not doublebuffered, | ||
+ | the raster beam is at the scorescreen, | ||
+ | color-RAM scrolling " | ||
+ | into trouble, but experience indicates that during a single frame, not so many | ||
+ | packed sprite frames change (naturally, we don't want to waste time depacking | ||
+ | the same sprite again), and therefore not so many have to be depacked, so | ||
+ | there' | ||
+ | |||
+ | The unflipped sprite depacking is quite straightforward. This code depacks one | ||
+ | slice. The destination address uses X as index; the highbyte of the address has | ||
+ | to be modified into the code. For the source address, variables temp3-temp4 + Y | ||
+ | register are used for zeropage indirect addressing. | ||
+ | |||
+ | < | ||
+ | dspr_slice: | ||
+ | bcc dspr_emptyslice | ||
+ | dspr_fullslice: | ||
+ | dspr_fullsta1: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta2: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta3: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta4: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta5: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta6: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | dspr_fullsta7: | ||
+ | iny | ||
+ | rts | ||
+ | |||
+ | dspr_emptyslice: | ||
+ | dspr_emptysta1: | ||
+ | dspr_emptysta2: | ||
+ | dspr_emptysta3: | ||
+ | dspr_emptysta4: | ||
+ | dspr_emptysta5: | ||
+ | dspr_emptysta6: | ||
+ | dspr_emptysta7: | ||
+ | rts | ||
+ | </ | ||
+ | |||
+ | When also flipping the sprite, things become harder. I use a 256-byte lookup- | ||
+ | table to flip the bit-pairs in each byte, but the problem is that there' | ||
+ | free index register to access that, so we have to modify the code to access the | ||
+ | fliptable instead (a bit slow). Fortunately depacking an empty slice is still | ||
+ | easy & fast. | ||
+ | |||
+ | < | ||
+ | dsprl_slice: | ||
+ | bcc dsprl_emptyslice | ||
+ | dsprl_fullslice: | ||
+ | sta dsprl_fulllda1+1 | ||
+ | dsprl_fulllda1: | ||
+ | dsprl_fullsta1: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda2+1 | ||
+ | dsprl_fulllda2: | ||
+ | dsprl_fullsta2: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda3+1 | ||
+ | dsprl_fulllda3: | ||
+ | dsprl_fullsta3: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda4+1 | ||
+ | dsprl_fulllda4: | ||
+ | dsprl_fullsta4: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda5+1 | ||
+ | dsprl_fulllda5: | ||
+ | dsprl_fullsta5: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda6+1 | ||
+ | dsprl_fulllda6: | ||
+ | dsprl_fullsta6: | ||
+ | iny | ||
+ | lda (temp3),y | ||
+ | sta dsprl_fulllda7+1 | ||
+ | dsprl_fulllda7: | ||
+ | dsprl_fullsta7: | ||
+ | iny | ||
+ | rts | ||
+ | |||
+ | dsprl_emptyslice: | ||
+ | dsprl_emptysta1: | ||
+ | dsprl_emptysta2: | ||
+ | dsprl_emptysta3: | ||
+ | dsprl_emptysta4: | ||
+ | dsprl_emptysta5: | ||
+ | dsprl_emptysta6: | ||
+ | dsprl_emptysta7: | ||
+ | rts | ||
+ | </ | ||
+ | |||
+ | MW4 uses a total of 114 packed spriteframes, | ||
+ | that many of them are also shown flipped, the number rises to about 170. So, | ||
+ | the game would clearly have been impossible to implement without packed sprites! | ||
+ | |||
+ | ====== 2. Flipping of sprites while loading them ====== | ||
+ | |||
+ | Near the end of MW4 development, | ||
+ | nice to have at least 3 save game slots, but they'd take 48 blocks off the disk. | ||
+ | To somewhat help this," | ||
+ | instead there' | ||
+ | and a " | ||
+ | flip duplicates at load time. | ||
+ | |||
+ | This routine is also in " | ||
+ | also reside under the I/O area, so it needs to be switched off/ | ||
+ | disabled. Of course, this can't happen for a prolonged time, or the raster | ||
+ | interrupts would freak out. | ||
+ | |||
+ | The " | ||
+ | for one sprite: temp2 is the row counter, alo-ahi are the zeropage source | ||
+ | pointer, and tempadrlo-tempadrhi are the destination pointer. The routine first | ||
+ | takes the rightmost source byte of a sprite row, flips the bitpairs, and puts | ||
+ | it to the leftmost byte of destination sprite, then flips the middle byte, and | ||
+ | then, at last, leftmost source byte is flipped and put into the rightmost | ||
+ | destination byte. This process is repeated for all 21 rows of the sprite. | ||
+ | |||
+ | < | ||
+ | lda #21 ;Row counter | ||
+ | sta temp2 | ||
+ | clc | ||
+ | loadspr_rowloop: | ||
+ | sei | ||
+ | dec $01 | ||
+ | lda (alo),y | ||
+ | ldy #0 | ||
+ | tax | ||
+ | lda fliptable,x | ||
+ | sta (tempadrlo), | ||
+ | iny | ||
+ | lda (alo),y | ||
+ | tax | ||
+ | lda fliptable,x | ||
+ | sta (tempadrlo), | ||
+ | dey | ||
+ | lda (alo),y | ||
+ | ldy #2 | ||
+ | tax | ||
+ | lda fliptable,x | ||
+ | sta (tempadrlo), | ||
+ | inc $01 | ||
+ | cli | ||
+ | lda tempadrlo | ||
+ | adc #$03 | ||
+ | sta tempadrlo | ||
+ | lda alo | ||
+ | adc #$03 | ||
+ | sta alo | ||
+ | dec temp2 | ||
+ | bne loadspr_rowloop | ||
+ | </ | ||
+ | |||
+ | ====== 3. Scripting system ====== | ||
+ | |||
+ | The scripting system as implemented in the " | ||
+ | interpreters, | ||
+ | in 2KB chunks and executing it. All code corresponding to this is in the file | ||
+ | " | ||
+ | |||
+ | Scripting performs various activities of the game such as the title screen, | ||
+ | parts of the game menu system, starting a new game, conversations and such. | ||
+ | There are 2 ways to call a script: | ||
+ | |||
+ | * " | ||
+ | |||
+ | * " | ||
+ | |||
+ | A script routine has no "state information", | ||
+ | variables (defined in " | ||
+ | beginning, when Ian gets hit by the alien craft, the appearances/ | ||
+ | other characters are timed by a simple delay counter. | ||
+ | |||
+ | To ease the pain of scripting, I defined some macros, these are in the file | ||
+ | " | ||
+ | |||
+ | An important concept of the scripting system is to have access to all | ||
+ | subroutines and variables of the game main program. Therefore, all its symbols | ||
+ | are dumped in the makefile for use by the script routines. A script routine can | ||
+ | naturally crash the game if it wants to, so care had to be exercised when | ||
+ | writing them, just like writing the main engine code. | ||
+ | |||
+ | Care must also be taken of what routines are called in the script routine: if | ||
+ | it JSRs off to a subroutine that also calls EXECSCRIPT, a different script file | ||
+ | might be in memory upon return and a crash would be inevitable. Therefore, | ||
+ | whenever this is suspected, the script routine uses a JMP instead, or sets a | ||
+ | latent script to be executed on the next frame. | ||
+ | |||
+ | Because each script routine is identified by a 16bit number (highbyte = file | ||
+ | number, lowbyte = entrypoint), | ||
+ | (switches, computers and such) in the levels was easy using the level editor | ||
+ | (AOMEDIT2). The bytecode-based scripting system of the preview wouldn' | ||
+ | allowed that as easily. |
base/rant11.txt · Last modified: 2015-04-17 04:33 by 127.0.0.1