base:rant9
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | base:rant9 [2015-04-17 04:33] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Frameskipping, | ||
+ | (This rant is mirrored from [[http:// | ||
+ | |||
+ | This is a theoretical rant about what you can possibly do when it seems you run | ||
+ | out of rastertime and your program starts to slow down. Of course, the | ||
+ | obvious solution is to optimize code or leave routines out entirely, but this | ||
+ | is not about that... | ||
+ | |||
+ | |||
+ | ====== 0. Running out of time ====== | ||
+ | |||
+ | Traditionally games & demos use frame-based movement; for example 50 times a | ||
+ | second the screen & sprites move a little bit to create the illusion of | ||
+ | motion. But for the C64, trying to do all the movement/ | ||
+ | as the actual graphics code (scrolling the screen, raster IRQs etc.) on | ||
+ | each frame can simply be too much. | ||
+ | |||
+ | If a program has been designed well, going " | ||
+ | ugly graphical effects like flickering/ | ||
+ | as an overall slowdown. Practically, | ||
+ | design involves making sure that no matter what happens, raster interrupts | ||
+ | have enough time to do their screen-update task; in fact their task should be | ||
+ | just the immediate setting of VIC registers (like screen-splits and sprite | ||
+ | multiplexing) and playing music/ | ||
+ | |||
+ | All the time-consuming things like movement, AI & scrolling are done in | ||
+ | the main program; if it is too " | ||
+ | correctly. However, this rant challenges that tried-and-true approach with | ||
+ | a new, more complex but more powerful approach. | ||
+ | |||
+ | The basic question is, what can we do once we see that the C64 can't simply | ||
+ | handle the load? Of course, there' | ||
+ | give you more clock cycles, but there are ways to sort of " | ||
+ | keeping the internal logic of the program uncompromised. | ||
+ | |||
+ | |||
+ | ====== 1. Movement/ | ||
+ | |||
+ | |||
+ | To know what we can do about slowing down we have to see if two distinct | ||
+ | areas can be identified from the code: | ||
+ | |||
+ | Movement/ | ||
+ | |||
+ | * For advancing the internal state of the program. This means moving the characters, processing AI, doing collision detection, and executing virtual machine bytecode :) | ||
+ | |||
+ | Graphics update | ||
+ | |||
+ | * For rendering the internal state onscreen. On C64, this means things like scrolling (shifting the screen memory if no VIC tricks are used), sorting the sprites for multiplexing, | ||
+ | |||
+ | Usually in games these are easy to separate, while in demo effects this is | ||
+ | not always the case. If they can't be separated in your project, the rest of | ||
+ | this rant will not be of much use to you. | ||
+ | |||
+ | ====== 2. Frameskipping ====== | ||
+ | |||
+ | Let's first look at an approach many know from PC games/ | ||
+ | PCs are usually powerful enough to handle the AI & movement on each frame but | ||
+ | not necessarily to render complex graphics scenes. | ||
+ | |||
+ | The idea is to keep track of the time a frame was last rendered onscreen; if | ||
+ | for example the previous rendering was 3 frames away, call the movement/ | ||
+ | code 3 times before the next rendering call. Now the motion becomes more | ||
+ | jerky, as not each frame is drawn, but the apparent speed remains the same. | ||
+ | |||
+ | Frameskipping can be very useful in 3D games or isometric 2D bitmap games | ||
+ | even on C64. Of course, if also the movement/ | ||
+ | will not help much. | ||
+ | |||
+ | Also in games like Last Ninja, where the sprites have to be masked against | ||
+ | the background (very time-consuming!), | ||
+ | I'm not sure if Last Ninjas actually frameskip. I remember seeing the | ||
+ | characters move with bigger steps in some CPU-intense scenes, like the | ||
+ | maintenance areas in LN2:Office level, so likely at least LN2 frameskipped. | ||
+ | |||
+ | ====== 3. Interpolation ====== | ||
+ | |||
+ | But what if we want to still keep running at 50Hz, scrolling almost the whole | ||
+ | screen with lots of sprites flying around? And what if we have some really | ||
+ | complex AI routines that take a lot of time? Perhaps we want to depack sprites | ||
+ | in realtime, too? | ||
+ | |||
+ | This all is (within some limits) possible on a stock C64. The key is to not | ||
+ | run your full movement/ | ||
+ | 4th frame, and you'll have quite a bit of time left over. In the in-between | ||
+ | frames, interpolate the movement of sprites with a line equation (linear | ||
+ | interpolation). | ||
+ | |||
+ | I'm not sure if anyone has used this in a C64 game before? The realization came | ||
+ | to me in the end of 2001, and after that I started developing this idea. I know | ||
+ | that some games like Gauntlet 3 or Myth scroll at 50Hz, while updating the | ||
+ | movement of sprites at a lower rate; however they don't show the inbetween- | ||
+ | frames for sprites. | ||
+ | |||
+ | There are different ways to do this. The easier but not-so-powerful approach | ||
+ | is this: The main program handles everything, raster interrupts just show the | ||
+ | screen & sprites in the way the main program wishes. The update loop could then | ||
+ | be something like: | ||
+ | |||
+ | 1st frame: | ||
+ | * Do full movement/ | ||
+ | * Scroll screen if necessary | ||
+ | * Sort sprites | ||
+ | * Instruct raster IRQs to update screen on next frame | ||
+ | |||
+ | 2nd frame: | ||
+ | * Interpolate sprite positions between old & new (much faster than performing the full movement) | ||
+ | * Scroll screen if necessary | ||
+ | * Sort sprites | ||
+ | * Instruct raster IRQs to update screen on next frame | ||
+ | |||
+ | 3rd frame: | ||
+ | * Start the cycle again from the beginning | ||
+ | |||
+ | The key to get good performance is to always calculate as much as you can, | ||
+ | don't stop to wait! If you have a double-buffered screen, you don't have to | ||
+ | care of the raster beam position when you scroll the screen-RAM (except if | ||
+ | you use character-sprites like Turrican etc.) Of course, you *do* have to | ||
+ | care of it when you scroll the color-RAM :) | ||
+ | |||
+ | However, the bottleneck is still the full movement/ | ||
+ | the interpolation on each second frame we win a little time but still, if | ||
+ | the movement/ | ||
+ | on time and the whole action slows down. | ||
+ | |||
+ | ====== 4. Advanced interpolation & re-entrant IRQs ====== | ||
+ | |||
+ | This is a bit like multitasking. The concept of this was quite complex & | ||
+ | disgusting to me at first but then I realized it can be quite a nice way | ||
+ | to do things. Now the movement/ | ||
+ | CPU has free time and frame updates will still come in time. | ||
+ | |||
+ | We'll have 3 areas of code: | ||
+ | |||
+ | * Movement/ | ||
+ | |||
+ | * Code for handling the next frame update and interpolating sprite movement. Scrolling goes here also. This will be executed from IRQs, however, whenever needed, it will be interrupted by | ||
+ | |||
+ | * The actual low-level raster IRQ code. Screen splits, setting multiplexed sprites onscreen and playing music/ | ||
+ | |||
+ | The idea is that once the movement/ | ||
+ | that a frame counter maintained by the frame update IRQ has reached its end | ||
+ | value (2 or 4). Now it will reset this counter, and the frame update IRQ will | ||
+ | interpolate the next 2 or 4 frames. After this, it stops. The catch is that | ||
+ | if the movement/ | ||
+ | from the other areas, a visible " | ||
+ | |||
+ | Another thing to consider (especially on NTSC machines) is that if a big | ||
+ | portion of screen is scrolled, with color-RAM, and with lots of sprites to | ||
+ | be interpolated & sorted, the frame update IRQ might also run too slow. In that | ||
+ | case it should just wait for the next frame. | ||
+ | |||
+ | Now the big question is: how do we invoke the frame update IRQ while also | ||
+ | letting the low-level IRQs run? I don't want to use multiple interrupt sources, | ||
+ | so I did the following: | ||
+ | |||
+ | For starters, all IRQs must save CPU registers on the stack. Using fixed | ||
+ | zeropage addresses instead would corrupt the regs in case of nested IRQs | ||
+ | (that will occur with this approach!) | ||
+ | |||
+ | I usually have one raster interrupt at the top of the screen, to set up the | ||
+ | gamescreen display, and to start firing up the sprite multiplexing interrupts. | ||
+ | And then another in the bottom of the screen for scorepanel display & playing | ||
+ | music. | ||
+ | |||
+ | Now, whenever either the top or bottom interrupt has completed processing, | ||
+ | we do, before exiting it: | ||
+ | < | ||
+ | dec $d019 ; | ||
+ | cli ; | ||
+ | jmp frameupdate | ||
+ | </ | ||
+ | The frame update IRQ code itself must not be re-entered, so the first thing | ||
+ | it has to do is to maintain an execution counter like: | ||
+ | < | ||
+ | frameupdate: | ||
+ | inc exec_count | ||
+ | lda exec_count | ||
+ | cmp #$02 ;code | ||
+ | bcs skip | ||
+ | <actual frame update code here> | ||
+ | skip: dec exec_count | ||
+ | pla ;Exit the interrupt | ||
+ | tay | ||
+ | pla | ||
+ | tax | ||
+ | pla | ||
+ | rti | ||
+ | </ | ||
+ | We see that the frame update code can be invoked from either the top or the | ||
+ | bottom interrupt. So it must maintain some kind of internal state about what | ||
+ | it is going to do next. And whenever it can't do anything useful for the | ||
+ | time being, it exits. For example a color-RAM update is only sensible to do | ||
+ | after the bottom interrupt (must not be visible!) This complicates things | ||
+ | a bit... | ||
+ | |||
+ | A problem with this " | ||
+ | must maintain their own set of sprite information, | ||
+ | on each other' | ||
+ | You might already be familiar with this if you've done doublebuffered sprite- | ||
+ | multiplexing; | ||
+ | |||
+ | Doing movement/ | ||
+ | The actual motion happens in big steps. I do this in MW4's second preview that | ||
+ | runs at 50Hz, unlike the first preview. I was personally worried that control | ||
+ | might feel too lagged with this low update rate so I got myself some guinea | ||
+ | pigs (thanks CreaMD and Pixman) but after they confirmed that it didn't | ||
+ | feel too lagged I carried on with my plans. | ||
+ | |||
+ | ====== 5. Conclusion ====== | ||
+ | |||
+ | I hope these ideas were interesting. Any example code would be as big as an | ||
+ | entire game engine itself so I'll leave that out :), but you can study | ||
+ | the source code of the second MW4 preview to see how it was implemented | ||
+ | in practice. | ||
+ | |||
+ | Note that MW4 full game uses only the simpler interpolation method described | ||
+ | in chapter 4 of this rant, as it didn't need anything more, and the memory | ||
+ | use of the chapter 4-style interpolation would have been too heavy. Also, | ||
+ | 4-frame interpolation feels indeed quite lagged, just compare the preview | ||
+ | to the full game :) | ||
+ | |||
+ | Lasse Öörni - loorni@student.oulu.fi |
base/rant9.txt · Last modified: 2015-04-17 04:33 by 127.0.0.1