User Tools

Site Tools


base:optimal_sort_16-bit_elements

Optimal Sort for any number of 16-bit elements

by Mats Rosengren

General Discussion

The extension of the “Optimal Sort” algorithm to “an unlimited number” (i.e. more then 255 elements) of 16 bit elements is straightforward using standard 6502 technique.

Consider first the loop using the Y register to sequentially access the elements starting with the second last and ending with the first in a sequence of up to 255 elements (note that the last access here is with 1 in the Y register, i.e. the “ZPADD” should point to the byte before the first element):

L1   DEY
     BEQ L3
     LDA (ZPADD),Y
     "some code"
     BRA L1
L3   

To access a longer sequence one could use nested loops with the X register for the outer and the Y register for the inner loop as follows:

L1   TYA
     BNE L2
     TXA
     BEQ L3
     DEX
     DEC ZPADD+1
L2   DEY
     LDA (ZPADD),Y
     "some code"
     BRA L1
L3  

If initially the registers X and Y are given the values x and y and the value y is added to high part of the pointer ZPADD pointing to the first element of the sequence to be sorted the loop will be run through y + 256 * x times starting with the second last and ending with the first in a sequence of 1 + y + 256 * x elements. And when the loop is finished the pointer ZPADD will again point to the first element of the sequence while the X and the Y registers will take the value zero. Note that as here the last access is with 0 in the Y register the “ZPADD” should indeed point to the first element, not to the byte before the first element

To work with 16 bit values one has to handle the 8 less significant bits and the 8 more significant bits separately. The comparison is then made such that first the 8 most significant bits are compared. The 8 less significant bits only need to be compared if the 8 more significant bits were equal. The complete routine is then as follows:

SORT CLC
     LDA ZPADL+1        ;INITIALISE WORK4L
     ADC NVALH
     STA WORK4L
     BCS LER            ;BAD INPUT VALUES (TOO FAR TO BRANCH DIRECTLY)
     LDA ZPADH+1        ;INITIALISE WORK4H
     ADC NVALH
     STA WORK4H
LER  BCS L3B            ;BAD INPUT VALUES
L0   LDY NVALL          ;NUMBER OF DECREMENTS, LOW
     LDX NVALH          ;NUMBER OF DECREMENTS, HIGH
     LDA WORK4L
     STA ZPADL+1
     LDA (ZPADL),Y      ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
     STA WORK3L         ;SAVE LEAST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
     LDA WORK4H
     STA ZPADH+1
     LDA (ZPADH),Y      ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
     STA WORK3H         ;SAVE MOST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
     BRA L2
L1   TYA                ;START INNER LOOP
     BNE L1A
     TXA
     BEQ L3             ;EXIT INNER LOOP
     DEX
     DEC ZPADL+1
     DEC ZPADH+1
L1A  DEY
     LDA (ZPADH),Y
     CMP WORK2H
     BNE L1B
     LDA (ZPADL),Y
     CMP WORK2L
L1B  BCC L1
L2   LDA (ZPADL),Y
     STA WORK2L         ;POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
     LDA (ZPADH),Y
     STA WORK2H         ;POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
     STY WORK1L         ;INDEX OF POTENTIALLY LARGEST VALUE, LOW
     LDA ZPADL+1
     STA WORK1LH        ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
     LDA ZPADH+1
     STA WORK1HH        ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
     BRA L1             ;END INNER LOOP
L3   LDY NVALL          ;WHERE THE LARGEST VALUE SHALL BE PUT
     LDA WORK4L
     STA ZPADL+1
     LDA WORK2L         ;THE LARGEST VALUE, LEAST SIGNIFICANT PART
     STA (ZPADL),Y
     LDA WORK4H
     STA ZPADH+1
     LDA WORK2H         ;THE LARGEST VALUE, MOST SIGNIFICANT PART
     STA (ZPADH),Y
     LDY WORK1L         ;INDEX LOW OF FREE SPACE
     LDA WORK1LH        ;INDEX HIGH OF FREE SPACE, LEAST SIGNIFICANT PART 
     STA ZPADL+1
     LDA WORK3L         ;THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART
     STA (ZPADL),Y      ;PUT THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART, IN THE FREE SPACE
     LDA WORK1HH        ;INDEX HIGH OF FREE SPACE, MOST SIGNIFICANT PART
     STA ZPADH+1
     LDA WORK3H         ;THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART
     STA (ZPADH),Y      ;PUT THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART, IN THE FREE SPACE
     LDA NVALL
     BNE L3A
     LDA NVALH
     BEQ L3B            ;EXIT OUTER LOOP
     DEC NVALH
     DEC WORK4L
     DEC WORK4H
L3A  DEC NVALL          ;END OF THE SHORTER SEQUENCE STILL LEFT
     BRA L0             ;START WORKING WITH THE SHORTER SEQUENCE
L3B  RTS

The calling program has to set the pointers ZPADL and ZPADH pointing to the start of the (separately stored) sequences of the 8 less and the 8 more significant bits of the values. The total number of values are 1 + y + 256 * x where the value x has to be given to variable NVALL and the value y has to be given to variable NVALH. At the end these variables will both be zero.

In addition the following 9 bytes are needed internally

WORK1L
WORK1LH
WORK1HH
WORK2L
WORK2H
WORK3L
WORK3H
WORK4L
WORK4H

The following is a template for a program calling the routine:

;
ZPADL   = $30           ;2 BYTE POINTER IN PAGE ZERO TO THE "LESS SIGNIFICANT" SEQUENCE. SET BY CALLING PROGRAM
ZPADH   = $32           ;2 BYTE POINTER IN PAGE ZERO TO THE "MORE SIGNIFICANT" SEQUENCE. SET BY CALLING PROGRAM
NVALL   = $34           ;< "START ADDRESS" - "END ADDRESS" OF SEQUENCE. SET BY CALLING PROGRAM
NVALH   = $35           ;> "START ADDRESS" - "END ADDRESS" OF SEQUENCE. SET BY CALLING PROGRAM
;
;9 BYTES USED AS WORKING AREA
;
WORK1L  = $36           ;LOW INDEX (Y) FOR POTENTIALLY LARGEST VALUE
WORK1LH = $37           ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
WORK1HH = $38           ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
WORK2L  = $39           ;POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
WORK2H  = $3A           ;POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
WORK3L  = $3B           ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
WORK3H  = $3C           ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
WORK4L  = $3D           ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
WORK4H  = $3E           ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
     *=$1000            ;CODE ANYWHERE IN RAM OR ROM
     LDA #< SEQL
     STA ZPADL
     LDA #> SEQL
     STA ZPADL+1
     LDA #< SEQH
     STA ZPADH
     LDA #> SEQH
     STA ZPADH+1
     LDA #$09
     STA NVALL
     LDA #$00
     STA NVALH
     JSR SORT
     BRK
SEQL .BYTE    $30
     .BYTE    $20
     .BYTE    $DF
     .BYTE    $5A
     .BYTE    $81
     .BYTE    $C7
     .BYTE    $62
     .BYTE    $6D
     .BYTE    $6A
     .BYTE    $76
SEQH .BYTE    $05
     .BYTE    $1F
     .BYTE    $16
     .BYTE    $0D
     .BYTE    $1E
     .BYTE    $17
     .BYTE    $18
     .BYTE    $10
     .BYTE    $26
     .BYTE    $10
"Insert the code for subroutine SORT here"

To test the efficiency (and correctness) of this algorithm I generated a totally random permutation of the numbers 0 - 9999 using a random number generator. This sequence was successfully sorted using subroutine SORT above. I measured the number of T1H counts of the W65C22S timer 1 during this sorting, it was 4324613. As T1H decrements every 256 PHI2 pulses and my W6502 runs at 8 megahertz this corresponds to 138.4 seconds.

base/optimal_sort_16-bit_elements.txt · Last modified: 2015-04-17 04:33 by 127.0.0.1