User Tools

Site Tools


base:optimal_sort_16-bit_elements

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

base:optimal_sort_16-bit_elements [2015-04-17 04:33] (current)
Line 1: Line 1:
 +====== Optimal Sort for any number of 16-bit elements ======
 +by Mats Rosengren
 +
 +====== General Discussion ======
 +The extension of the "​[[optimal_sort_8-bit_elements|Optimal Sort]]"​ algorithm to "an unlimited number"​ (i.e. more then 255 elements) of 16 bit elements is straightforward using standard 6502 technique.
 +
 +Consider first the loop using the Y register to sequentially access the elements starting with the second last and ending with the first in a sequence of up to 255 elements (note that the last access here is with 1 in the Y register, i.e. the "​ZPADD"​ should point to the byte before the first element):
 +<​code>​
 +L1   DEY
 +     BEQ L3
 +     LDA (ZPADD),Y
 +     "​some code"
 +     BRA L1
 +L3   
 +</​code>​
 +To access a longer sequence one could use nested loops with the X register for the outer and the Y register for the inner loop as follows:
 +<​code>​
 +L1   TYA
 +     BNE L2
 +     TXA
 +     BEQ L3
 +     DEX
 +     DEC ZPADD+1
 +L2   DEY
 +     LDA (ZPADD),Y
 +     "​some code"
 +     BRA L1
 +L3  ​
 +</​code>​
 +If initially the registers X and Y are given the values x and y and the value y is added to high part of the pointer ZPADD pointing to the first element of the sequence to be sorted the loop will be run through y + 256 * x times starting with the second last and ending with the first in a sequence of 1 + y + 256 * x elements. And when the loop is finished the pointer ZPADD will again point to the first element of the sequence while the X and the Y registers will take the value zero. Note that as here the last access is with 0 in the Y register the "​ZPADD"​ should indeed point to the first element, not to the byte before the first element
 +
 +To work with 16 bit values one has to handle the 8 less significant bits and the 8 more significant bits separately. The comparison is then made such that first the 8 most significant bits are compared. The 8 less significant bits only need to be compared if the 8 more significant bits were equal. The complete routine is then as follows:
 +
 +<​code>​
 +SORT CLC
 +     LDA ZPADL+1 ​       ;INITIALISE WORK4L
 +     ADC NVALH
 +     STA WORK4L
 +     BCS LER            ;BAD INPUT VALUES (TOO FAR TO BRANCH DIRECTLY)
 +     LDA ZPADH+1 ​       ;INITIALISE WORK4H
 +     ADC NVALH
 +     STA WORK4H
 +LER  BCS L3B            ;BAD INPUT VALUES
 +L0   LDY NVALL          ;NUMBER OF DECREMENTS, LOW
 +     LDX NVALH          ;NUMBER OF DECREMENTS, HIGH
 +     LDA WORK4L
 +     STA ZPADL+1
 +     LDA (ZPADL),​Y ​     ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
 +     STA WORK3L ​        ;SAVE LEAST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
 +     LDA WORK4H
 +     STA ZPADH+1
 +     LDA (ZPADH),​Y ​     ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
 +     STA WORK3H ​        ;SAVE MOST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
 +     BRA L2
 +L1   ​TYA ​               ;START INNER LOOP
 +     BNE L1A
 +     TXA
 +     BEQ L3             ;EXIT INNER LOOP
 +     DEX
 +     DEC ZPADL+1
 +     DEC ZPADH+1
 +L1A  DEY
 +     LDA (ZPADH),Y
 +     CMP WORK2H
 +     BNE L1B
 +     LDA (ZPADL),Y
 +     CMP WORK2L
 +L1B  BCC L1
 +L2   LDA (ZPADL),Y
 +     STA WORK2L ​        ;​POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
 +     LDA (ZPADH),Y
 +     STA WORK2H ​        ;​POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
 +     STY WORK1L ​        ;​INDEX OF POTENTIALLY LARGEST VALUE, LOW
 +     LDA ZPADL+1
 +     STA WORK1LH ​       ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
 +     LDA ZPADH+1
 +     STA WORK1HH ​       ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
 +     BRA L1             ;END INNER LOOP
 +L3   LDY NVALL          ;WHERE THE LARGEST VALUE SHALL BE PUT
 +     LDA WORK4L
 +     STA ZPADL+1
 +     LDA WORK2L ​        ;THE LARGEST VALUE, LEAST SIGNIFICANT PART
 +     STA (ZPADL),Y
 +     LDA WORK4H
 +     STA ZPADH+1
 +     LDA WORK2H ​        ;THE LARGEST VALUE, MOST SIGNIFICANT PART
 +     STA (ZPADH),Y
 +     LDY WORK1L ​        ;​INDEX LOW OF FREE SPACE
 +     LDA WORK1LH ​       ;INDEX HIGH OF FREE SPACE, LEAST SIGNIFICANT PART 
 +     STA ZPADL+1
 +     LDA WORK3L ​        ;THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART
 +     STA (ZPADL),​Y ​     ;PUT THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART, IN THE FREE SPACE
 +     LDA WORK1HH ​       ;INDEX HIGH OF FREE SPACE, MOST SIGNIFICANT PART
 +     STA ZPADH+1
 +     LDA WORK3H ​        ;THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART
 +     STA (ZPADH),​Y ​     ;PUT THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART, IN THE FREE SPACE
 +     LDA NVALL
 +     BNE L3A
 +     LDA NVALH
 +     BEQ L3B            ;EXIT OUTER LOOP
 +     DEC NVALH
 +     DEC WORK4L
 +     DEC WORK4H
 +L3A  DEC NVALL          ;END OF THE SHORTER SEQUENCE STILL LEFT
 +     BRA L0             ;​START WORKING WITH THE SHORTER SEQUENCE
 +L3B  RTS
 +</​code>​
 +The calling program has to set the pointers ZPADL and ZPADH pointing to the start of the (separately stored) sequences of the 8 less and the 8 more significant bits of the values. The total number of values are 1 + y + 256 * x where the value x has to be given to variable NVALL and the value y has to be given to variable NVALH. At the end these variables will both be zero.
 +
 +In addition the following 9 bytes are needed internally
 +<​code>​
 +WORK1L
 +WORK1LH
 +WORK1HH
 +WORK2L
 +WORK2H
 +WORK3L
 +WORK3H
 +WORK4L
 +WORK4H
 +</​code>​
 +The following is a template for a program calling the routine:
 +<​code>​
 +;
 +ZPADL   = $30           ;2 BYTE POINTER IN PAGE ZERO TO THE "LESS SIGNIFICANT"​ SEQUENCE. SET BY CALLING PROGRAM
 +ZPADH   = $32           ;2 BYTE POINTER IN PAGE ZERO TO THE "MORE SIGNIFICANT"​ SEQUENCE. SET BY CALLING PROGRAM
 +NVALL   = $34           ;< "START ADDRESS"​ - "END ADDRESS"​ OF SEQUENCE. SET BY CALLING PROGRAM
 +NVALH   = $35           ;> "START ADDRESS"​ - "END ADDRESS"​ OF SEQUENCE. SET BY CALLING PROGRAM
 +;
 +;9 BYTES USED AS WORKING AREA
 +;
 +WORK1L ​ = $36           ;LOW INDEX (Y) FOR POTENTIALLY LARGEST VALUE
 +WORK1LH = $37           ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
 +WORK1HH = $38           ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
 +WORK2L ​ = $39           ;​POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
 +WORK2H ​ = $3A           ;​POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
 +WORK3L ​ = $3B           ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
 +WORK3H ​ = $3C           ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
 +WORK4L ​ = $3D           ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
 +WORK4H ​ = $3E           ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
 +     ​*=$1000 ​           ;CODE ANYWHERE IN RAM OR ROM
 +     LDA #< SEQL
 +     STA ZPADL
 +     LDA #> SEQL
 +     STA ZPADL+1
 +     LDA #< SEQH
 +     STA ZPADH
 +     LDA #> SEQH
 +     STA ZPADH+1
 +     LDA #$09
 +     STA NVALL
 +     LDA #$00
 +     STA NVALH
 +     JSR SORT
 +     BRK
 +SEQL .BYTE    $30
 +     ​.BYTE ​   $20
 +     ​.BYTE ​   $DF
 +     ​.BYTE ​   $5A
 +     ​.BYTE ​   $81
 +     ​.BYTE ​   $C7
 +     ​.BYTE ​   $62
 +     ​.BYTE ​   $6D
 +     ​.BYTE ​   $6A
 +     ​.BYTE ​   $76
 +SEQH .BYTE    $05
 +     ​.BYTE ​   $1F
 +     ​.BYTE ​   $16
 +     ​.BYTE ​   $0D
 +     ​.BYTE ​   $1E
 +     ​.BYTE ​   $17
 +     ​.BYTE ​   $18
 +     ​.BYTE ​   $10
 +     ​.BYTE ​   $26
 +     ​.BYTE ​   $10
 +"​Insert the code for subroutine SORT here"
 +</​code>​
 +
 +To test the efficiency (and correctness) of this algorithm I generated a totally random permutation of the numbers 0 - 9999 using a random number generator. This sequence was successfully sorted using subroutine SORT above. I measured the number of T1H counts of the W65C22S timer 1 during this sorting, it was 4324613. As T1H decrements every 256 PHI2 pulses and my W6502 runs at 8 megahertz this corresponds to 138.4 seconds. ​
  
base/optimal_sort_16-bit_elements.txt ยท Last modified: 2015-04-17 04:33 (external edit)