base:cycle_exact_measuring_of_routine_execution_times
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | base:cycle_exact_measuring_of_routine_execution_times [2015-04-17 04:31] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Cycle Exact Measuring of Execution Times ====== | ||
+ | In most cases one will measure how long certain subroutines take to execute by changing the border colors. This is usually sufficient to see how many rasters are wasted, but sometimes you want to know the exact number of cycles spent, or the routine in question takes more than a frame to execute, causing the color changes overlap in a way that makes it difficult to see where the execution starts and ends. For these cases the CIA timers come in handy: | ||
+ | |||
+ | < | ||
+ | |||
+ | ; jsr measure | ||
+ | ; jsr evaluate | ||
+ | |||
+ | ;Note: max cycle count range is limited to about 65.500 cycles (=roughly 3 frames) | ||
+ | |||
+ | |||
+ | overhead | ||
+ | irqs_allowed | ||
+ | dma_off | ||
+ | sprites_off | ||
+ | printout | ||
+ | ;(or some other location to look it up via ml-mon) | ||
+ | |||
+ | |||
+ | ;* = $1000 ;uncomment to precompile to wanted address | ||
+ | |||
+ | |||
+ | jmp measure | ||
+ | |||
+ | evaluate | ||
+ | lda #0 | ||
+ | sta $dc0f | ||
+ | lda vald011 | ||
+ | sta $d011 | ||
+ | lda vald015 | ||
+ | sta $d015 | ||
+ | cld | ||
+ | sec | ||
+ | lda #< | ||
+ | sbc $dc06 | ||
+ | sta locycles | ||
+ | lda #> | ||
+ | sbc $dc07 | ||
+ | |||
+ | .if !printout | ||
+ | ldx locycles | ||
+ | jsr $bdcd | ||
+ | lda #13 | ||
+ | jsr $ffd2 | ||
+ | lda statusreg | ||
+ | pha | ||
+ | plp | ||
+ | rts | ||
+ | .else | ||
+ | |||
+ | ldy locycles | ||
+ | ldx #$30-1 | ||
+ | stx ten1000s | ||
+ | stx ten1000s+1 | ||
+ | stx ten1000s+2 | ||
+ | stx ten1000s+3 | ||
+ | stx ten1000s+4 | ||
+ | |||
+ | sec | ||
+ | hploop | ||
+ | inc ten1000s-$30+1, | ||
+ | tya | ||
+ | sbc lo,x | ||
+ | tay | ||
+ | lda temp | ||
+ | sbc hi,x | ||
+ | bcs hdloop | ||
+ | |||
+ | tya | ||
+ | adc lo,x | ||
+ | tay | ||
+ | inx | ||
+ | cpx #$34 | ||
+ | sec | ||
+ | bne hploop+3 | ||
+ | |||
+ | ldx #4 | ||
+ | print lda ten1000s,x | ||
+ | sta printout,x | ||
+ | lda $d021 | ||
+ | eor #8 | ||
+ | sta (printout// | ||
+ | dex | ||
+ | bpl print | ||
+ | |||
+ | lda statusreg | ||
+ | pha | ||
+ | plp | ||
+ | rts | ||
+ | |||
+ | temp .byte 0 ; | ||
+ | ten1000s | ||
+ | lo = *-$30+1 | ||
+ | .byte < | ||
+ | hi = *-$30+1 | ||
+ | .byte > | ||
+ | |||
+ | .fi | ||
+ | |||
+ | locycles | ||
+ | vald015 | ||
+ | vald011 | ||
+ | statusreg | ||
+ | |||
+ | measure | ||
+ | sei | ||
+ | pla | ||
+ | sta statusreg | ||
+ | lda $d011 | ||
+ | sta vald011 | ||
+ | lda $d015 | ||
+ | sta vald015 | ||
+ | ldx #$00 | ||
+ | stx $dc0f ;stop timer b (not really necessary, but still) | ||
+ | .if dma_off | ||
+ | stx $d011 | ||
+ | .fi | ||
+ | .if sprites_off | ||
+ | stx $d015 | ||
+ | .fi | ||
+ | |||
+ | dex | ||
+ | cpx $d012 | ||
+ | bne *-3 ;wait for vblank area | ||
+ | stx $dc06 ;set to $ffff | ||
+ | stx $dc07 | ||
+ | lda #$19 | ||
+ | |||
+ | .if irqs_allowed | ||
+ | cli | ||
+ | .fi | ||
+ | |||
+ | sta $dc0f ; | ||
+ | rts | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | So for example, if you had to find out how many cycles your latest uberbrilliant sprite-sorting algo takes, you could do that like this: | ||
+ | |||
+ | < | ||
+ | jsr initdata | ||
+ | jsr measure | ||
+ | jsr sortalgo | ||
+ | jsr evaluate | ||
+ | </ | ||
+ | |||
+ | Note that the zen-timer can't be used for really slow routines as it can only count up to about 65.500 cycles. For those routines you should use the extended timer below which chains both CIA1 timers together but thereby doesn' | ||
+ | |||
+ | < | ||
+ | ;LNG-TIMER 64, 6502tass version. Original idea by M. Abrash. Extended ;version for extra-slow routine evaluation. Doesn' | ||
+ | ;& output is in hex for simplicity' | ||
+ | |||
+ | ; jsr measure | ||
+ | ; jsr evaluate | ||
+ | |||
+ | |||
+ | overhead | ||
+ | dma_off | ||
+ | sprites_off | ||
+ | printout | ||
+ | |||
+ | ;* = $1000 ;uncomment to precompile to wanted address | ||
+ | |||
+ | |||
+ | jmp measure | ||
+ | |||
+ | evaluate | ||
+ | lda #0 | ||
+ | sta $dc0e | ||
+ | sta $dc0f | ||
+ | lda vald011 | ||
+ | sta $d011 | ||
+ | lda vald015 | ||
+ | sta $d015 | ||
+ | cld | ||
+ | sec | ||
+ | lda #< | ||
+ | sbc $dc04 | ||
+ | sta cycles | ||
+ | lda #> | ||
+ | sbc $dc05 | ||
+ | sta cycles+1 | ||
+ | lda #$ff | ||
+ | sbc $dc06 | ||
+ | sta cycles+2 | ||
+ | lda #$ff | ||
+ | sbc $dc07 | ||
+ | sta cycles+3 | ||
+ | ldx #3 | ||
+ | ldy #0 | ||
+ | | ||
+ | lsr | ||
+ | lsr | ||
+ | lsr | ||
+ | lsr | ||
+ | jsr toscreen | ||
+ | lda cycles,x | ||
+ | and #$0f | ||
+ | jsr toscreen | ||
+ | dex | ||
+ | bpl showresult | ||
+ | | ||
+ | lda statusreg | ||
+ | pha | ||
+ | plp | ||
+ | rts | ||
+ | |||
+ | toscreen | ||
+ | cmp #$0a ;courtesy of Frank Kontros | ||
+ | adc #$30 | ||
+ | cld | ||
+ | sta printout,y | ||
+ | lda $d021 | ||
+ | eor #$08 | ||
+ | sta (printout// | ||
+ | iny | ||
+ | rts | ||
+ | | ||
+ | |||
+ | cycles | ||
+ | vald015 | ||
+ | vald011 | ||
+ | statusreg | ||
+ | |||
+ | measure | ||
+ | sei | ||
+ | pla | ||
+ | sta statusreg | ||
+ | lda $d011 | ||
+ | sta vald011 | ||
+ | lda $d015 | ||
+ | sta vald015 | ||
+ | ldx #$00 | ||
+ | stx $dc0e ;stop timers | ||
+ | stx $dc0f | ||
+ | .if dma_off | ||
+ | stx $d011 | ||
+ | .fi | ||
+ | .if sprites_off | ||
+ | stx $d015 | ||
+ | .fi | ||
+ | |||
+ | dex | ||
+ | cpx $d012 | ||
+ | bne *-3 ;wait for vblank area | ||
+ | stx $dc04 ;set timers to $ffffffff | ||
+ | stx $dc05 | ||
+ | stx $dc06 | ||
+ | stx $dc07 | ||
+ | lda #$59 | ||
+ | sta $dc0f ; | ||
+ | lda #$11 | ||
+ | sta $dc0e ; | ||
+ | rts | ||
+ | </ |
base/cycle_exact_measuring_of_routine_execution_times.txt · Last modified: 2015-04-17 04:31 by 127.0.0.1