====== D.Y.S.P. using a cycle table ====== ==== Introduction ==== This article describes a way to do a d.y.s.p. with eight sprites, using flexible cycle-wasting in the loop. This method differs from the $d017-approach in that we can use the full 21 lines of a sprite. Unfortunately this means our loop gets a little more complicated due to the fact that the number of cycles on a given raster line is now dependent on the number of sprites on that line. === Getting and assembling the code === The code is hosted at [[https://github.com/Compyx/dysp-pretimed|GitHub]]. It requires [[https://sourceforge.net/projects/tass64/|64tass]]. Once you've cloned the repo, a simple make can be used to assemble to code into a runnable .prg file. {{:base:dysp-cycle-timed.png|}} ==== Opening the border with flexible timing ==== The raster routine itself is fairly simple, we manipulate $d011 so we don't get any bad lines, which in turn allows us to display all eight sprites in the side-border: dysp ldy #8 ldx #0 - lda d021_table,x dec $d016 sta $d021 sty $d016 lda d011_table,x sta $d011 ; this is the interesting bit, for each raster line ; we alter the branch so it wastes the correct number ; of cycles for the next iteration of the loop: lda timing,x sta _delay + 1 _delay bpl * + 2 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 bit $ea inx cpx #DYSP_HEIGHT bne - rts By altering the branch instruction to skip a variable amount of bytes of code, we can determine the number of cycles the loop must waste on a raster line. Generally speaking, the more sprites we have on a line, the more cycles the VIC eats, which means we have to branch further to skip more cycles. The "cpx #$e0 ... bit $ea" code is used to waste cycles with an accuracy of a single cycle, I use the same trick to time VSP's (with a few more "cpx #$e0" instructions). Here's how it works (simplified): ; here we waste 2 + 2 + 3 = 7 cycles: .1000 bpl $1002 .1002 cpx #$e0 ; 2 .1004 cpx #$e0 ; 2 .1006 bit $ea ; 3 ; adjusting the branch, we can waste 6 cycles: .1000 bpl $1003 ; we branch into the argument of CPX #$e0 at $1003 .1002 cpx #$e0 ; which means we execute CPX #$e0 at $1003, .1004 cpx #$e0 ; then CPX #$24 at $1005 .1006 bit $ea ; and finally NOP at $1007 ; so, for the above code , the CPU executes this: .1000 bpl $1003 ; .1002 gets skipped .1003 cpx #$e0 ; 2 .1005 cpx #$24 ; 2 .1007 nop ; 2 So for each additional byte we branch over, we either end at BIT $EA (3 cycles) or NOP (2 cycles). Now we can waste anywhere between 0 (no sprites) and 17 cycles (all 8 sprites). ==== Calculating how many cycles we use ==== === cycle table == To determine how many cycles we need to waste, we need to know how many cycles each combination of sprites uses. That's where the 'cycles' table comes in, it gives us the amount of cycles sprites use for each 'sprite enable'/$d015 value. I used a little tool to determine those values, using brute force: for each combination I adjusted the cycle-delay until I got the proper value. The tool is also on [[https://github.com/Compyx/dysp-timer|GitHub]], but be warned, the code is a little messy. === sprite enable table === Now to determine which values to pick from the 'cycles' table and store in the 'timing' table, we need yet another table, which I call the 'sprite enable' table. This table is cleared each frame, and then populated using ORA for each sprite: For each sprite, at the proper Y-position, we ORA 21 values of the sprite enable table with the bit value for that particular sprite: $01 for sprite 0, $02 for sprite 1, up to $80 for sprite 7. An illustration might help: suppose we have sprite 0 at Y-offset 0, sprite 1 at Y-offset 3 and sprite 2 at Y-offset 5: Sprite ORA values Sprite enable Y-offset spr0 spr1 spr2 table result -------- ----- ---- ---- ------------- 00 01 01 01 01 01 02 01 01 03 01 02 03 04 01 02 03 05 01 02 04 07 06 01 02 04 07 === loop timing table === Using this result, we can use the 'sprite enable' table values as an index into the cycles table to get the proper value for the 'timing' table used in the border-loop: ldx #0 - ldy sprite_enable,x lda cycles,y sta timing,x inx cpx #DYSP_HEIGHT bne - rts ==== Done ==== And there you have it, we now have a D.Y.S.P. with flexible Y-positions. Naturally all this calculating eats cycles, which is why my code uses a lot of unrolled loops. ===== The Code ===== Finally, the code. It also contains a little user interface, allowing the user to change the DYSP's movements with a joystick in port 2. ; vim: set et ts=8 sw=8 sts=8 syntax=64tass : ; D.Y.S.P. using pre-calculated cycle table ; ; 2016-04-01 music_sid = "Blitter.sid" music_init = $1000 music_play = $1003 DYSP_HEIGHT = 128 JOY_UP = $01 JOY_DOWN = $02 JOY_LEFT = $04 JOY_RIGHT = $08 JOY_FIRE = $10 zp = $10 ; BASIC SYS line * = $0801 .word (+), 2016 .null $9e, ^start + .word 0 ; Initialization code ; ; Set up sprites, initialize VIC, set up IRQ handlers start jsr $fda3 jsr $fd15 jsr $ff5b sei ; set sprite colors clc ldx #0 - txa adc #1 sta $d027,x inx cpx #8 bne - ; create an example sprite in the tape buffer ldx #$3f lda #$ff - sta $0340,x dex bpl - lda #($340 / 64) ldx #7 - sta $07f8,x dex bpl - ; set up interface text ldx #0 - lda iface_text,x sta $0400,x lda #$0b sta $d800,x inx bne - - lda iface_text + 256,x sta $0500,x lda #$0b sta $d900,x inx cpx #$40 bne - lda #0 jsr music_init lda #$35 sta $01 lda #$7f sta $dc0d sta $dd0d ldx #0 stx $dc0e stx $dd0e stx $3fff lda #$01 sta $d01a lda #$1b sta $d011 lda #$29 ldx #irq1 sta $d012 stx $fffe sty $ffff ldx #break stx $fffa sty $fffb stx $fffc sty $fffd bit $dc0d bit $dd0d inc $d019 cli jmp * ;----------------------------------------------------------------------------; ; IRQ handlers: $d020 changes are used to show the raster time used by the ; ; various routines. ; ;----------------------------------------------------------------------------; ; avoid timing critical loops to cross page boundaries .align 256 irq1 ; 'Double IRQ' method to stabilize raster pha txa pha tya pha lda #$2a ldx #irq2 sta $d012 stx $fffe sty $ffff lda #1 inc $d019 tsx cli nop nop nop nop nop nop nop nop nop nop nop irq2 txs ldx #8 - dex bne - bit $ea lda $d012 cmp $d012 beq + + ldx #$10 - lda sprite_positions,x sta $d000,x dex bpl - lda #$ff sta $d015 ldx #$14 - dex bne - nop jsr dysp ; the actual DYSP loop lda #0 sta $d021 sta $d015 dec $d020 ; responde to user input jsr joystick2 jsr update_iface dec $d020 jsr param_highlight dec $d020 jsr music_play lda #0 sta $d020 lda #$f9 ldx #irq3 sta $d012 stx $fffe sty $ffff lda #1 sta $d019 pla tay pla tax pla break rti irq3 pha txa pha tya pha ldx #7 - dex bne - stx $d011 ldx #30 - dex bne - lda #$1b sta $d011 dec $d020 ; calculate X and Y movement of the DYSP jsr dysp_x_sinus jsr dysp_y_sinus dec $d020 ; clear the 'sprite enable' table jsr dysp_clear_timing_fast dec $d020 ; calculate the 'sprite enable' values for raster line of the DYSP jsr dysp_calc_timing_fast lda #0 sta $d020 lda #$29 ldx #irq1 sta $d012 stx $fffe sty $ffff lda #1 sta $d019 pla tay pla tax pla rti ; sprite positions: values for $d000-$d010 sprite_positions .byte $00, $a0 .byte $18, $a0 .byte $30, $a0 .byte $48, $a0 .byte $60, $a0 .byte $78, $a0 .byte $90, $a0 .byte $a8, $a0 .byte $00 ;----------------------------------------------------------------------------; ; User interface/DYSP control code ; ;----------------------------------------------------------------------------; ; DYSP sinus control parameters dysp_x_idx1 .byte 0 dysp_x_idx2 .byte 0 dysp_y_idx1 .byte 0 dysp_y_idx2 .byte 0 params ; offset 0 of the parameters, used by the joystick routine dysp_x_adc1 .byte 12 dysp_x_adc2 .byte 9 dysp_x_spd1 .byte 2 dysp_x_spd2 .byte 3 dysp_y_adc1 .byte $0f dysp_y_adc2 .byte $f6 dysp_y_spd1 .byte $fe dysp_y_spd2 .byte $3 ; colorram locations of the parameter values param_colram .word $d807, $d811, $d81c, $d826 .word $d82f, $d839, $d844, $d84e ; index in the parameter list for the joystick routine param_index .byte 0 ; Text for the user interface iface_text .enc screen ; 0123456789abcdef0123456789abcdef01234567 .text "xadc1: 00 xadc2: 00 xspd1: 00 xspd2: 00" .text "yadc1: 00 yadc2: 00 yspd1: 00 yspd2: 00" .text " " .text "joystick in port 2: " .text " " .text " left/right - select parameter " .text " up/down - adjust parameter " .text " fire button - set parameter to 0 " iface_text_end ; Translate A into hexadecimal digits in A (bit 7-4) and X (bit 3-0) hex_digits pha and #$0f cmp #$0a bcs + adc #$3a + sbc #$09 tax pla lsr lsr lsr lsr cmp #$0a bcs + adc #$3a + sbc #$09 rts ; Update the interface's parameter display update_iface lda dysp_x_adc1 jsr hex_digits sta $0407 stx $0408 lda dysp_x_adc2 jsr hex_digits sta $0411 stx $0412 lda dysp_x_spd1 jsr hex_digits sta $041c stx $041d lda dysp_x_spd2 jsr hex_digits sta $0426 stx $0427 lda dysp_y_adc1 jsr hex_digits sta $042f stx $0430 lda dysp_y_adc2 jsr hex_digits sta $0439 stx $043a lda dysp_y_spd1 jsr hex_digits sta $0444 stx $0445 lda dysp_y_spd2 jsr hex_digits sta $044e stx $044f rts ; highlight the currently adjustable parameter param_highlight ; clear param highlighting ldx #0 - lda param_colram,x sta zp lda param_colram + 1,x sta zp + 1 ldy #0 lda #$0f sta (zp),y iny sta (zp),y inx inx cpx #16 bne - ; highlight current param lda param_index asl tax lda param_colram,x sta zp lda param_colram + 1,x sta zp + 1 ldy #0 lda #$01 sta (zp),y iny sta (zp),y rts ; Check user input from joystick #2 joystick2 lda #8 beq + dec joystick2 + 1 rts + lda $dc00 sta zp and #%00011111 eor #%00011111 bne + rts + lda #8 sta joystick2 + 1 lda zp and #JOY_UP beq joy2_up lda zp and #JOY_DOWN beq joy2_down lda zp and #JOY_LEFT beq joy2_left lda zp and #JOY_RIGHT beq joy2_right lda zp and #JOY_FIRE beq joy2_fire rts joy2_up ldx param_index inc params,x rts joy2_down ldx param_index dec params,x rts joy2_left lda param_index sec sbc #1 and #7 sta param_index rts joy2_right lda param_index clc adc #1 and #7 sta param_index rts joy2_fire ldx param_index lda #0 sta params,x rts ; Calculate DYSP Y-movement dysp_y_sinus ldx dysp_y_idx1 ldy dysp_y_idx2 ; unroll loop for speed: .for index = 0, index < 8, index = index + 1 lda ysinus,x clc adc ysinus,y adc #$32 sta sprite_positions + 1 + (index * 2) .if index < 7 ; only needed 7 times txa clc adc dysp_y_adc1 tax tya clc adc dysp_y_adc2 tay .endif .next lda dysp_y_idx1 clc adc dysp_y_spd1 sta dysp_y_idx1 lda dysp_y_idx2 clc adc dysp_y_spd2 sta dysp_y_idx2 rts ; temp storage for $d010 calculations xmsb_tmp .byte 0 ; Calculate DYSP X-movement using two sinus tables added together dysp_x_sinus lda #0 sta xmsb_tmp ldx dysp_x_idx1 ldy dysp_x_idx2 ; once again unroll loop for speed .for index = 0, index < 8, index = index + 1 lda xsinus_256,x clc adc xsinus_96,y sta sprite_positions + (index * 2) bcc + lda xmsb_tmp ora #(1 << index) sta xmsb_tmp + .if index < 7 ; this section is only needed 7 times txa clc adc dysp_x_adc1 tax tya clc adc dysp_x_adc2 tay .endif .next ; store $d010 value in the IRQ handler lda xmsb_tmp sta sprite_positions + 16 lda dysp_x_idx1 clc adc dysp_x_spd1 sta dysp_x_idx1 lda dysp_x_idx2 clc adc dysp_x_spd2 sta dysp_x_idx2 rts .align 256 ; avoid page boundary crossing in raster bars ; The actual DYSP routine: ; ; The access to the 'timing' table is what makes this possible. It contains, ; for each raster line, the number of cycles the sprites use. By storing that ; value in the BPL argument we can waste between 0 and 17 cycles inclusive. ; ; Unrolling this loop and altering the code which calculates the cycle waste ; values (storing them directly in the unrolled code, not in a table), we can ; easily add three raster splits. dysp ldy #8 ldx #0 - lda d021_table,x dec $d016 sta $d021 sty $d016 lda d011_table,x sta $d011 lda timing,x sta _delay + 1 _delay bpl * + 2 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 cpx #$e0 bit $ea inx cpx #DYSP_HEIGHT bne - rts .cerror * > $0fff, "code section too large!" * = $2000 ysinus .byte ((DYSP_HEIGHT - 24) / 4) + 0.5 + ((DYSP_HEIGHT - 24) / 4) * sin(range(256) * rad(360.0/256)) xsinus_256 .byte 127.5 + 128 * sin(range(256) * rad(360.0/256)) xsinus_96 .byte 47.5 + 48 * sin(range(256) * rad(360.0/256)) .align 256 ; The 'sprite enable' table, this is where the number of active sprites per ; raster line is stored. The values in this table are used as index into the ; 'cycles' table to get the proper amount of cycles to skip in the DYSP loop dysp_sprite_enable .fill DYSP_HEIGHT, 0 .align 256 d011_table .for row = 0, row < DYSP_HEIGHT, row = row + 1 .byte $18 + ((row + 3) & 7) .next .align 256 ; Raster bar colors d021_table .byte $06, $00, $06, $04, $00, $06, $04, $0e .byte $00, $06, $04, $0e, $0f, $00, $06, $04 .byte $0e, $0f, $07, $00 ,$06, $04, $0e, $0f .byte $07, $01, $07, $0f, $0e, $04, $06, $00 .byte $07, $0f, $0e, $04, $06, $00, $0f, $0e .byte $04, $06, $00, $0e, $04, $06, $00, $04 .byte $06, $00, $06, $00, $09, $08, $0a, $0f .byte $07, $01, $07, $0f, $0a, $08, $09, $00 .byte $06, $00, $06, $04, $00, $06, $04, $0e .byte $00, $06, $04, $0e, $0f, $00, $06, $04 .byte $0e, $0f, $07, $00 ,$06, $04, $0e, $0f .byte $07, $01, $07, $0f, $0e, $04, $06, $00 .byte $07, $0f, $0e, $04, $06, $00, $0f, $0e .byte $04, $06, $00, $0e, $04, $06, $00, $04 .byte $06, $00, $06, $00, $09, $08, $0a, $0f .byte $07, $01, $07, $0f, $0a, $08, $09, $00 .align 256 ; cycle delay table timing .fill 2, 0 ; don't touch this, raster code starts early .fill DYSP_HEIGHT - 2, 0 .align 256 ; number of cycles to skip in the branch cycles ; skip cycles $d015 sprite(s) active ; $00-$07 .byte 0 ; $00 - %00000000 no sprites .byte 3 ; $01 - %00000001 0 .byte 5 ; $02 - %00000010 1 .byte 5 ; $03 - %00000011 1 0 ; $04-$07 .byte 5 ; $04 - %00000100 2 .byte 7 ; $05 - %00000101 2 0 .byte 7 ; $06 - %00000110 2 1 .byte 7 ; $07 - %00000111 2 1 0 ; $08-$0b .byte 5 ; $08 - %00001000 3 .byte 8 ; $09 - %00001001 3 0 .byte 9 ; $0a - %00001010 3 1 .byte 9 ; $0b - %00001011 3 1 0 ; $0c-$0f .byte 7 ; $0c - %00001100 3 2 .byte 9 ; $0d - %00001101 3 2 0 .byte 9 ; $0e - %00001110 3 2 1 .byte 9 ; $0f - %00001111 3 2 1 0 ; $10-$13 .byte 5 ; $10 - %00010000 4 .byte 7 ; $11 - %00010001 4 0 .byte 10 ; $12 - %00010010 4 1 .byte 10 ; $13 - %00010011 4 1 0 ; $14-$17 .byte 9 ; $14 - %00010100 4 2 .byte 11 ; $15 - %00010101 4 2 0 .byte 11 ; $16 - %00010110 4 2 1 .byte 11 ; $17 - %00010111 4 2 1 0 ; $18-$1b .byte 7 ; $18 - %00011000 4 3 .byte 10 ; $19 - %00011001 4 3 0 .byte 11 ; $1a - %00011010 4 3 1 .byte 11 ; $1b - %00011011 4 3 1 0 ; $1c-$1f .byte 9 ; $1c - %00011100 4 3 2 .byte 11 ; $1d - %00011101 4 3 2 0 .byte 11 ; $1e - %00011110 4 3 2 1 .byte 11 ; $1f - %00011111 4 3 2 1 0 ; $20-$2f .byte $05, $08, $09, $09 .byte $09, $0c, $0c, $0c .byte $09, $0c, $0d, $0d .byte $0b, $0d, $0d, $0d ; $30-$3f .byte $07, $09, $0c, $0c .byte $0b, $0d, $0d, $0d .byte $09, $0c, $0d, $0d .byte $0b, $0d, $0d, $0d ; $40-$4f .byte $05, $07, $0a, $0a .byte $0a, $0b, $0b, $0b .byte $0a, $0d, $0e, $0e .byte $0b, $0e, $0e, $0e ; $50-$5f .byte $09, $0b, $0e, $0e .byte $0d, $0f, $0f, $0f .byte $0b, $0e, $0f, $0f .byte $0d, $0f, $0f, $0f ; $60-$6f .byte $07, $0a, $0b, $0b .byte $0b, $0e, $0e, $0e .byte $0b, $0e, $0f, $0f .byte $0d, $0f, $0f, $0f ; $70-$7f .byte $09, $0b, $0e, $0e .byte $0d, $0f, $0f, $0f .byte $0b, $0e, $0f, $0f .byte $0d, $0f, $0f, $0f ; $80-$8f .byte $05, $08, $09, $09 .byte $09, $0c, $0c, $0c .byte $09, $0d, $0d, $0d .byte $0c, $0d, $0d, $0d ; $90-$9f .byte $09, $0c, $0f, $0f .byte $0d, $10, $10, $10 .byte $0c, $0f, $10, $10 .byte $0d, $10, $10, $10 ; $a0-$af .byte $09, $0c, $0d, $0d .byte $0d, $10, $10, $10 .byte $0d, $10, $11, $11 .byte $0f, $11, $11, $11 ; $b0-$bf .byte $0b, $0d, $10, $10 .byte $0f, $11, $11, $11 .byte $0d, $10, $11, $11 .byte $0f, $11, $11, $11 ; $c0-$cf .byte $07, $09, $0c, $0c .byte $0c, $0d, $0d, $0d .byte $0c, $0f, $10, $10 .byte $0d, $10, $10, $10 ; $d0-$df .byte $0b, $0d, $10, $10 .byte $0f, $11, $11, $11 .byte $0d, $10, $11, $11 .byte $0f, $11, $11, $11 ; $e0-$ef .byte $09, $0c, $0d, $0d .byte $0d, $10, $10, $10 .byte $0d ,$10, $11, $11 .byte $0f, $11, $11, $11 ; $f0-$ff .byte $0b, $0d, $10, $10 .byte $0f, $11, $11, $11 .byte $0d, $10, $11, $11 .byte $0f, $11, $11, $11 ; Clear the 'sprite enable' table, unrolled for speed dysp_clear_timing_fast lda #0 .for row = 0, row < DYSP_HEIGHT, row = row + 1 sta dysp_sprite_enable + row .next rts ; Calculate the 'sprite enable' values for each raster line of the DYSP ; ; For each sprite, we ORA 21 bytes of the table with the bitmask for that ; particular sprite. The result of these calculations is used to look up the ; number of cycles to waste in the DYSP raster code ; ; Again unrolled for speed, but still takes a lot of raster time dysp_calc_timing_fast lda sprite_positions + 1 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #1 sta dysp_sprite_enable + row,x .next lda sprite_positions + 3 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #2 sta dysp_sprite_enable + row,x .next lda sprite_positions + 5 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #4 sta dysp_sprite_enable + row,x .next lda sprite_positions + 7 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #8 sta dysp_sprite_enable + row,x .next lda sprite_positions + 9 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #16 sta dysp_sprite_enable + row,x .next lda sprite_positions + 11 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #32 sta dysp_sprite_enable + row,x .next lda sprite_positions + 13 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #64 sta dysp_sprite_enable + row,x .next lda sprite_positions + 15 sec sbc #$32 tax .for row = 0, row < 21, row = row + 1 lda dysp_sprite_enable + row,x ora #128 sta dysp_sprite_enable + row,x .next ; update actual cycle skip table .for row = 0, row < DYSP_HEIGHT, row = row + 1 ldy dysp_sprite_enable + row lda cycles,y sta timing + 2 + row .next rts ; Link music * = $1000 .binary music_sid, $7e