base:quicksort_16-bit_elements
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
base:quicksort_16-bit_elements [2016-08-13 10:51] – litwr2 | base:quicksort_16-bit_elements [2016-08-30 22:28] – a fix for an array address >= $8000 litwr2 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Quicksort (for 16-bit Elements) ====== | ====== Quicksort (for 16-bit Elements) ====== | ||
- | by Vladimir Lidovski aka litwr, 13 Aug 2016 | + | by Vladimir Lidovski aka litwr, 13 Aug 2016 (with help of BigEd) |
- | It is well known that the best, the fastest sort routine is Quicksort. | + | It is well known that the best, the fastest sort routine is Quicksort. |
The next Pascal code was translated to 6502 assembler. | The next Pascal code was translated to 6502 assembler. | ||
Line 61: | Line 61: | ||
tya | tya | ||
adc i2hi | adc i2hi | ||
- | | + | |
sta zp1hi | sta zp1hi | ||
ror zp1lo | ror zp1lo | ||
Line 71: | Line 71: | ||
lda (zp1lo),y | lda (zp1lo),y | ||
sta x_hi | sta x_hi | ||
- | qsloop1 | + | qsloop1 |
lda (i2lo),y | lda (i2lo),y | ||
cmp x_lo | cmp x_lo | ||
Line 87: | Line 87: | ||
bne qsloop1 ;=jmp | bne qsloop1 ;=jmp | ||
- | qs_l1 ldy #0 | + | qs_l1 ldy #0 ;compare array[j] and x |
lda x_lo | lda x_lo | ||
cmp (j2lo),y | cmp (j2lo),y | ||
Line 103: | Line 103: | ||
bne qs_l1 ;=jmp | bne qs_l1 ;=jmp | ||
- | qs_l3 lda j2lo | + | qs_l3 lda j2lo ; |
cmp i2lo | cmp i2lo | ||
lda j2hi | lda j2hi | ||
Line 109: | Line 109: | ||
bcc qs_l8 | bcc qs_l8 | ||
- | qs_l6 lda (j2lo),y | + | qs_l6 lda (j2lo), |
tax | tax | ||
lda (i2lo),y | lda (i2lo),y | ||
Line 208: | Line 208: | ||
//array// is the address of the array and //size2// is its size in bytes. | //array// is the address of the array and //size2// is its size in bytes. | ||
- | The main trick is the work with the stack. | + | The main trick is the work with the stack. |
Several time measurements | Several time measurements | ||
Line 214: | Line 214: | ||
| |Insertion |21.4 |0.14 |39.75 |0.16 | | | |Insertion |21.4 |0.14 |39.75 |0.16 | | ||
| |Shell | | |Shell | ||
- | | |Quick | + | | |Quick |
^4096 Integers^Sort Type ^Random ^Ordered ^Reversed ^Zeros ^ | ^4096 Integers^Sort Type ^Random ^Ordered ^Reversed ^Zeros ^ | ||
| |Insertion |317.98 |0.6 |635.13 |0.58 | | | |Insertion |317.98 |0.6 |635.13 |0.58 | | ||
| |Shell | | |Shell | ||
- | | |Quick | + | | |Quick |
^12288 Integers^Sort Type ^Random ^Ordered ^Reversed ^Zeros ^ | ^12288 Integers^Sort Type ^Random ^Ordered ^Reversed ^Zeros ^ | ||
| |Insertion |2877.48|1.78 |5714.08|1.75 | | | |Insertion |2877.48|1.78 |5714.08|1.75 | | ||
| |Shell | | |Shell | ||
- | | |Quick | + | | |Quick |
- | //Random//, // | + | //Random//, // |
So Quicksort is more than two times faster than Shell Sort. Its code occupies together with its call wrapping 259 bytes. | So Quicksort is more than two times faster than Shell Sort. Its code occupies together with its call wrapping 259 bytes. | ||
+ | |||
+ | This version of Quicksort requires almost all stack (about 240 bytes) to sort fast more than 32 KB of data. The required minimum is about 176 bytes but the stack with the only such minimum may slow down sorting dramatically. | ||
+ | |||
+ | It is possible to reduce the stack load by tail call elimination. | ||
+ | |||
+ | < | ||
+ | i2lo = zp0lo | ||
+ | i2hi = zp0hi | ||
+ | j2lo = zp2lo | ||
+ | j2hi = zp2hi | ||
+ | x_lo = m2lo | ||
+ | x_hi = m2hi | ||
+ | ublo = m3lo | ||
+ | ubhi = m3hi | ||
+ | lblo = m4lo | ||
+ | lbhi = m4hi | ||
+ | |||
+ | quicksort0 | ||
+ | tsx | ||
+ | cpx #16 ;stack limit | ||
+ | bcs qsok | ||
+ | |||
+ | qs_csp | ||
+ | dex | ||
+ | dex | ||
+ | txs | ||
+ | |||
+ | quicksort lda #> | ||
+ | sta ubhi | ||
+ | lda #< | ||
+ | sta ublo | ||
+ | lda #>array | ||
+ | sta lbhi | ||
+ | lda #<array | ||
+ | sta lblo | ||
+ | qsok lda lblo | ||
+ | sta i2lo | ||
+ | lda lbhi | ||
+ | sta i2hi | ||
+ | ldy ubhi | ||
+ | sty j2hi | ||
+ | lda ublo | ||
+ | sta j2lo | ||
+ | clc ;this code works only for the even align | ||
+ | adc i2lo | ||
+ | and #$fc | ||
+ | sta zp1lo | ||
+ | tya | ||
+ | adc i2hi | ||
+ | ror | ||
+ | sta zp1hi | ||
+ | ror zp1lo | ||
+ | |||
+ | ldy #0 | ||
+ | lda (zp1lo),y | ||
+ | sta x_lo | ||
+ | iny | ||
+ | lda (zp1lo),y | ||
+ | sta x_hi | ||
+ | qsloop1 | ||
+ | lda (i2lo),y | ||
+ | cmp x_lo | ||
+ | iny | ||
+ | lda (i2lo),y | ||
+ | sbc x_hi | ||
+ | bcs qs_l1 | ||
+ | |||
+ | lda #2 | ||
+ | adc i2lo | ||
+ | sta i2lo | ||
+ | bcc qsloop1 | ||
+ | |||
+ | inc i2hi | ||
+ | bne qsloop1 ;=jmp | ||
+ | |||
+ | qs_l1 ldy #0 ;compare array[j] and x | ||
+ | lda x_lo | ||
+ | cmp (j2lo),y | ||
+ | iny | ||
+ | lda x_hi | ||
+ | sbc (j2lo),y | ||
+ | bcs qs_l3 | ||
+ | |||
+ | lda j2lo | ||
+ | sbc #1 | ||
+ | sta j2lo | ||
+ | bcs qs_l1 | ||
+ | |||
+ | dec j2hi | ||
+ | bne qs_l1 ;=jmp | ||
+ | |||
+ | qs_l3 lda j2lo ;compare i and j | ||
+ | cmp i2lo | ||
+ | lda j2hi | ||
+ | sbc i2hi | ||
+ | bcc qs_l8 | ||
+ | |||
+ | qs_l6 lda (j2lo), | ||
+ | tax | ||
+ | lda (i2lo),y | ||
+ | sta (j2lo),y | ||
+ | txa | ||
+ | sta (i2lo),y | ||
+ | dey | ||
+ | bpl qs_l6 | ||
+ | |||
+ | ;sec | ||
+ | lda #1 ;CY=1 | ||
+ | adc i2lo | ||
+ | sta i2lo | ||
+ | bcc *+4 | ||
+ | inc i2hi | ||
+ | sec | ||
+ | lda j2lo | ||
+ | sbc #2 | ||
+ | sta j2lo | ||
+ | bcs *+4 | ||
+ | dec j2hi | ||
+ | ;lda j2lo | ||
+ | cmp i2lo | ||
+ | lda j2hi | ||
+ | sbc i2hi | ||
+ | ;bcc *+5 | ||
+ | ;jmp qsloop1 | ||
+ | bcs qsloop1 | ||
+ | |||
+ | qs_l8 lda lblo | ||
+ | cmp j2lo | ||
+ | lda lbhi | ||
+ | sbc j2hi | ||
+ | bcs qs_l5 | ||
+ | |||
+ | lda i2hi | ||
+ | pha | ||
+ | lda i2lo | ||
+ | pha | ||
+ | lda ubhi | ||
+ | pha | ||
+ | lda ublo | ||
+ | pha | ||
+ | lda j2hi | ||
+ | sta ubhi | ||
+ | lda j2lo | ||
+ | sta ublo | ||
+ | jsr quicksort0 | ||
+ | pla | ||
+ | sta ublo | ||
+ | pla | ||
+ | sta ubhi | ||
+ | pla | ||
+ | sta i2lo | ||
+ | pla | ||
+ | sta i2hi | ||
+ | qs_l5 lda i2lo | ||
+ | cmp ublo | ||
+ | lda i2hi | ||
+ | sbc ubhi | ||
+ | bcs qs_l7 | ||
+ | |||
+ | lda i2hi | ||
+ | sta lbhi | ||
+ | lda i2lo | ||
+ | sta lblo | ||
+ | jmp qsok | ||
+ | qs_l7 rts | ||
+ | </ | ||
+ | |||
+ | The locations //m3hi//, //m3lo//, //m4hi//, //m4lo// maybe situated anywhere in RAM. The invocation should be in the next form. | ||
+ | < | ||
+ | tsx | ||
+ | stx qs_csp+1 | ||
+ | jsr quicksort | ||
+ | </ | ||
+ | |||
+ | It is required to put the proper constants after // | ||
+ | |||
+ | The other published 6502 Quicksort is at [[http:// |
base/quicksort_16-bit_elements.txt · Last modified: 2020-10-20 18:53 by litwr2