magazines:chacking19
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | magazines:chacking19 [2015-04-17 04:34] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | < | ||
+ | ######## | ||
+ | ################## | ||
+ | ###### | ||
+ | ##### | ||
+ | ##### #### #### ## ##### #### | ||
+ | ##### ## ## #### ## ## | ||
+ | ##### | ||
+ | ##### ## ## ######## | ||
+ | ##### #### #### #### #### ##### #### | ||
+ | ##### ## | ||
+ | ###### | ||
+ | ################## | ||
+ | ######## | ||
+ | |||
+ | ............................................................................... | ||
+ | |||
+ | Seek, and ye shall find. | ||
+ | |||
+ | Ask, and it shall be given. | ||
+ | |||
+ | ............................................................................... | ||
+ | |||
+ | BSOUT | ||
+ | |||
+ | C' | ||
+ | |||
+ | Many of you, I am sure, have been wondering, "Is C=Hacking still | ||
+ | alive? | ||
+ | - BUT - | ||
+ | Although I have not lost interest in the 64, I have lost a lot of | ||
+ | free time I once had, and I am now able to pursue a lot of other interests! | ||
+ | So the total time allocated to the 64, and hence to C=Hacking, has | ||
+ | decreased considerably. | ||
+ | around August or September. | ||
+ | followed by work on Sirius, and I devoted my C64 time to them instead of | ||
+ | C=Hacking. | ||
+ | a plane, and... well, you get the idea. Poor issue #19 just got worked on | ||
+ | in little dribbles every few weeks. | ||
+ | The main reason I share this sad tale is that, the way I see it, | ||
+ | C=Hacking could use a little help, if it is to come out more frequently. | ||
+ | If nobody volunteers it will still come out, but in exactly the way it | ||
+ | does right now -- a little less frequently than it ought to. Some of the | ||
+ | more time-consuming tasks are: finding articles, reviewing (actually | ||
+ | refereeing) articles, and collecting the latest news and tips. Finding | ||
+ | articles means finding people who are doing some nifty Commodore project, | ||
+ | or talking someone into doing some nifty Commodore project. | ||
+ | article means reading the article carefully, making sure everything is | ||
+ | technically correct, making suggestions for improvement, | ||
+ | collecting news means being plugged into the system. | ||
+ | I have a few people I rely on for some of these things, but I | ||
+ | could use more, and if you'd like to help out (especially finding new | ||
+ | articles, or keeping up to date on the latest C64 news) please drop me | ||
+ | an email. | ||
+ | |||
+ | With that out of the way, brother Judd would like to preach on | ||
+ | a malaise that afflicts the C64 world and which has been getting worse: | ||
+ | Not Finishing The Job. I just think about all the promising projects | ||
+ | I've heard about over the last few years -- off the top of my head I remember | ||
+ | a SCPU game, a SCPU monitor, several demos, multiple utilities, a VDC code | ||
+ | library, several OSes... -- which were Almost Done. And where are they | ||
+ | now? Presumably, still Almost Done. So if you have a project which is | ||
+ | Almost Done, but has been sitting around for the last few months/ | ||
+ | please, please finish up that last 10% and release it. | ||
+ | We, the technical community, are a community. | ||
+ | from each other, we get ideas and motivation from each other, and we | ||
+ | push each other to do great things. | ||
+ | activity stimulates more activity, and decreased activity begets yet less | ||
+ | activity. | ||
+ | I'm not saying we're on the verge of a big programming renassaince, | ||
+ | but I am concerned that we are drying up. Maybe if people finish up those | ||
+ | programs lying around it will reverse the trend. | ||
+ | this finally finished-up issue want to make you go out and do cool stuff?) | ||
+ | |||
+ | In other news, The Wave seems to be testing out wonderfully and | ||
+ | is totally cool. In case you've been under a rock these past few months | ||
+ | The Wave is an integrated TCP/IP suite for Wheels -- telnet, graphical | ||
+ | web browser, PPP, the works. | ||
+ | for several months now and it is solid. | ||
+ | I was asked lo these many months ago to put in a plug for | ||
+ | |||
+ | http:// | ||
+ | |||
+ | which is run by Mike Naberezny (mnaberez@nyx.net). | ||
+ | comments, suggestions, | ||
+ | and tell him what you think. | ||
+ | The ever-resourceful Pasi Ojala has several new thingies on his | ||
+ | web site. This is probably ancient history by now but it's in my " | ||
+ | news" file, sooo... | ||
+ | |||
+ | 1) a voice-only copy of the Amiga Expo 1988 presentation by R.J.Mical | ||
+ | about the early years of Amiga is available in four parts as .mp3 | ||
+ | from http:// | ||
+ | (24kbit/s, 16kHz, mono, ~20MB total, over 100 minutes) | ||
+ | | ||
+ | the Amiga. The files may change location in the future but you | ||
+ | will find links to them from my page. Enjoy! | ||
+ | |||
+ | 2) Some VIC20 graphics are also available at | ||
+ | http:// | ||
+ | There is one picture which can be viewed with unexpanded VIC20 | ||
+ | (with 154x/7x or 1581 drive) and others for 8k-expanded | ||
+ | | ||
+ | There are also gif version of the pictures on the page. | ||
+ | |||
+ | Myke Carter (mykec@delphi.com) has developed a filter program | ||
+ | that allows C=Hacking to be converted to geoWrite format. | ||
+ | you'd like a geoWrite version of C=Hacking, send him some email! | ||
+ | |||
+ | Finally, this is memorial day here in the States, and I'd just like to | ||
+ | suggest folks take a little time to think about the purpose of this holiday | ||
+ | and why we have it. | ||
+ | |||
+ | Okay then, enough with the jabber, and on to hacking excellence. | ||
+ | |||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H 19 | ||
+ | |||
+ | ::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | BSOUT | ||
+ | o Voluminous ruminations from your unfettered editor. | ||
+ | |||
+ | |||
+ | Jiffies | ||
+ | o Things. | ||
+ | |||
+ | |||
+ | Side Hacking | ||
+ | |||
+ | o "Burst Fastloader for the C64", by Pasi Ojala < | ||
+ | The 128 can burst-load from devices such as the 1571 and 1581. | ||
+ | With a small hardware modification, | ||
+ | originally designed for. This article discusses the modification | ||
+ | along with example burstload code. | ||
+ | |||
+ | o " | ||
+ | < | ||
+ | on the PET 8000, including a demonstration BASIC program for | ||
+ | sending data to e.g. a centronics printer via the user port. | ||
+ | |||
+ | |||
+ | Main Articles | ||
+ | |||
+ | o "Sex, lies, and microkernal-based 65816 native OSes, part 1", | ||
+ | by Jolse Maginnis < | ||
+ | to learn about OS design and design philosophy. | ||
+ | starts with OS basics and ends with JOS innards. | ||
+ | you've been under a rock the past few months, is a rather cool | ||
+ | multitasking 65816 OS which can do some rather cool things). | ||
+ | |||
+ | o " | ||
+ | < | ||
+ | |||
+ | And on we go to article three in the series. | ||
+ | the investigation of the IRQ and NMI routines -- specifically, | ||
+ | the routines called by those routines (UDTIM, SCNKEY, etc.). | ||
+ | |||
+ | o "JPEG: Decoding and Rendering on a C64", by S. Judd < | ||
+ | and Adrian Gonzalez < | ||
+ | two articles: | ||
+ | |||
+ | " | ||
+ | JPEG encoding and decoding, with special attention to the IDCT, | ||
+ | and some related C64 issues. | ||
+ | |||
+ | " | ||
+ | Floyd-Steinberg dithering, and how the IFLI graphics in jpz are | ||
+ | rendered. | ||
+ | |||
+ | | ||
+ | .................................. Credits ................................... | ||
+ | |||
+ | Editor, The Big Kahuna, The Car' | ||
+ | C=Hacking logo by.......................... Mark Lawrence | ||
+ | |||
+ | Special thanks to the folks who have helped out with reviewing and such, | ||
+ | and to the article authors for being patient! | ||
+ | |||
+ | Legal disclaimer: | ||
+ | 1) If you screw it up it's your own fault! | ||
+ | 2) If you use someone' | ||
+ | |||
+ | About the authors: | ||
+ | |||
+ | Jolse Maginnis is a 20 year old programmer and web page designer, | ||
+ | currently taking a break from CS studies. | ||
+ | with the C64 at just five or six years of age, when his parents brought | ||
+ | home their " | ||
+ | to BASIC, and then on to ML. He always wanted to be a demo coder, and in | ||
+ | 1994 met up with a coder at a user's group meeting, and has since worked | ||
+ | on a variety of projects from NTSC fixing to writing demo pages and intros | ||
+ | and even a music collection. | ||
+ | is otherwise playing/ | ||
+ | movie or concert somewhere. | ||
+ | buy a SuperCPU, it's the way of the future" | ||
+ | one, anyone can! | ||
+ | |||
+ | Richard Cini is a 31 year old vice president of Congress Financial | ||
+ | Corporation, | ||
+ | his parents bought him a VIC-20 as a birthday present. | ||
+ | for general BASIC programming, | ||
+ | controlling the lawn sprinkler system, and for a text-to-speech synthesyzer. | ||
+ | All his CBM stuff is packed up right now, along with his other " | ||
+ | computers, including a PDP11/34 and a KIM-1. | ||
+ | old computers Richard enjoys gardening, golf, and recently has gotten | ||
+ | interested in robotics. | ||
+ | is unique in being fiercely loyal without being evangelical, | ||
+ | some other communities, | ||
+ | best use out of the 64. | ||
+ | |||
+ | Adrian Gonzalez is a 26 year old system/ | ||
+ | serving Laredo, TX and Neuvo Laredo, Mexico. | ||
+ | their parents to buy them a C64 in 1984, and whereas his brother moved on | ||
+ | to PCs he stuck with the 64 and later bought an Amiga. | ||
+ | programming in sixth grade and wrote a few BASIC programs for the family | ||
+ | business; since then Adrian has put several demos and utilities under his | ||
+ | belt. In addition to fancy graphics and music, Adrian has an interest | ||
+ | in copy protection schemes (and playing the occasional game, of course). | ||
+ | When he's not coding, he's either playing basketball, playing piano, | ||
+ | editing videos, or going out to movies/ | ||
+ | page at http:// | ||
+ | |||
+ | For information on the mailing list, ftp and web sites, send some email | ||
+ | to chacking-info@jbrain.com. | ||
+ | |||
+ | While http:// | ||
+ | C=Hacking is available many other places including | ||
+ | |||
+ | http:// | ||
+ | http:// | ||
+ | |||
+ | |||
+ | ................................... Jiffies .................................. | ||
+ | |||
+ | $FFC6 | ||
+ | |||
+ | I actually have a little Jiffy that I ' | ||
+ | those things that is so obvious and simple that it took me several tries | ||
+ | before I stumbled onto it. It also highlights a rather powerful feature | ||
+ | of the lowly C64 kernal. | ||
+ | |||
+ | Not long ago, I was asked to write a slideshow program for jpz. Ideally, | ||
+ | a slideshow program should be a " | ||
+ | load pictures from some list in a file. But I didn't see a decent way to do | ||
+ | this, especially for jpz which has maybe 200 bytes free total. | ||
+ | thunderclap finally occured. | ||
+ | |||
+ | Everyone has used CMD4 to redirect a file to the printer. | ||
+ | kernal can redirect _output_ to different devices, it can redirect the | ||
+ | _input_ to be from different devices, using CHKIN. | ||
+ | program has to do is open a list of filenames, redirect input to that file, | ||
+ | and execute the normal jpz. jpz just uses JSR CHRIN to get data -- normally | ||
+ | that data comes from the keyboard, but with CHKIN it comes from the file | ||
+ | instead, akin to "a.out < input" in unix. Since jpz doesn' | ||
+ | calling jpz repetitively will keep reading from the input file. | ||
+ | The result is a simple and effective slideshow program, and a trick | ||
+ | which ought to be useful in other situations. | ||
+ | code, located at $02ae to be autobooting. | ||
+ | |||
+ | * | ||
+ | * Simple slideshow -- slj 4/2000 | ||
+ | * | ||
+ | |||
+ | org $02ae | ||
+ | |||
+ | name txt ' | ||
+ | |||
+ | start | ||
+ | lda #start-name | ||
+ | ldx #<name | ||
+ | ldy #>name | ||
+ | jsr $ffbd | ||
+ | lda #3 | ||
+ | tay | ||
+ | ldx $ba | ||
+ | jsr $ffba | ||
+ | jsr $ffc0 | ||
+ | |||
+ | ldx #< | ||
+ | ldy #> | ||
+ | lda $10fb ; | ||
+ | cmp #$4c | ||
+ | bne :jpy | ||
+ | stx $10fc | ||
+ | sty $10fd | ||
+ | beq main | ||
+ | :jpy stx $10ed | ||
+ | sty $10ee | ||
+ | |||
+ | main | ||
+ | ldx #3 | ||
+ | jsr $ffc6 | ||
+ | jsr $ffe4 | ||
+ | lda $90 ;loop until EOF reached | ||
+ | and #$40 | ||
+ | bne :done | ||
+ | jmp $1000 ; | ||
+ | :done | ||
+ | lda #3 | ||
+ | jsr $ffc3 | ||
+ | jsr $ffcc | ||
+ | jmp $a474 | ||
+ | |||
+ | da start | ||
+ | da start | ||
+ | |||
+ | |||
+ | ................................ Side Hacking ................................ | ||
+ | |||
+ | Burst Fastloader for C64 by Pasi Ojala, albert@cs.tut.fi | ||
+ | ------------------------ | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | The synchronous serial port would' | ||
+ | both directions without processor intervention with the maximum speed | ||
+ | of one bit per two clock cycles. Without a bug-free synchronous serial | ||
+ | port the transfer had to be slowed down considerably so that the | ||
+ | | ||
+ | This became the dead slow software-driven Commodore serial protocol. | ||
+ | |||
+ | Syncronous Serial | ||
+ | | ||
+ | The complex interface adapter (6526 CIA) chips used in Commodore 64 | ||
+ | and later in Commodore 128 have bug-free synchronous serial | ||
+ | | ||
+ | mode, each time a rising edge is detected in the serial clock pin | ||
+ | | ||
+ | When 8 bits are received the accumulated bits are moved into the | ||
+ | | ||
+ | to reflect this. If the corresponding interrupt is enabled, an | ||
+ | | ||
+ | |||
+ | In output mode the serial clock line is controlled by Timer A. The | ||
+ | | ||
+ | is written to the serial data register, the value is clocked out | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | An obsolete signal in the peripheral serial bus (SRQ) was taken into | ||
+ | | ||
+ | | ||
+ | old serial clock line doubles as slow serial clock line and fast | ||
+ | | ||
+ | |||
+ | The fast serial protocol is basically very simple. The side sending | ||
+ | data configures its synchronous serial port into output mode, the | ||
+ | other side uses input mode. The old peripheral serial bus clock line | ||
+ | is controlled by the receiving side and is used as an acknowledge: | ||
+ | when the receiver is ready for data, it toggles the state of the clock | ||
+ | line. The actual data is transferred using the synchronous serial | ||
+ | | ||
+ | | ||
+ | for a byte to arrive into its serial data register. The actual | ||
+ | | ||
+ | |||
+ | Both the drive and the computer must detect whether the other side can | ||
+ | | ||
+ | using the synchronous serial port while doing handshaking. The drive | ||
+ | sends a fast serial byte when the computer sends a secondary address | ||
+ | | ||
+ | send the fast serial byte anytime after the drive is reset and before | ||
+ | the drive would send fast serial bytes. | ||
+ | |||
+ | Modification to c64 | ||
+ | | ||
+ | To use burst fastloader with C64 we need to connect the CIA | ||
+ | | ||
+ | | ||
+ | the serial bus data line to the syncronous serial port data line and | ||
+ | one to connect the serial bus SRQ (the obsolete line for service | ||
+ | | ||
+ | line. Select the right connections depending on whether you want to | ||
+ | use CIA1 or CIA2. | ||
+ | |||
+ | 1570/ | ||
+ | |||
+ | Pin1 SRQ Fast serial bus clk | ||
+ | Pin5 DATA Data - slow& | ||
+ | |||
+ | |||
+ | Top view - old c64, CIA1 | ||
+ | User port Cass port | ||
+ | |||
+ | |||||||||||| | ||
+ | |||||||||||| | ||
+ | | ||
+ | | ||
+ | | ||
+ | SP1 1 264 5 | ||
+ | |||
+ | |||
+ | Top view - old c64, CIA2 | ||
+ | User port Cass port | ||
+ | |||
+ | |||||||||||| | ||
+ | |||||||||||| | ||
+ | | ||
+ | | ||
+ | | ||
+ | SP2 1 264 5 | ||
+ | |||
+ | | ||
+ | port connector, but remember to leave the outer half of the connector | ||
+ | free so that you can still plug in your user port devices. | ||
+ | |||
+ | Then solder the other ends to the serial connector. Those left- and | ||
+ | | ||
+ | the soldering. You can also build a cable which connects those lines | ||
+ | | ||
+ | |||
+ | Software for C64 | ||
+ | | ||
+ | Of course the C64 only uses the standard slow serial routines and we | ||
+ | need a seperate fastloader routine to take advantage of the fast | ||
+ | | ||
+ | load routine is located in the unused area $2a7-$2ff and in the | ||
+ | | ||
+ | | ||
+ | | ||
+ | * a verify operation is requested | ||
+ | * a directory load operation is requested (filename starts with ' | ||
+ | * the filename starts with a colon (':' | ||
+ | |||
+ | So, it is possible to use the old load routine by prepending a colon | ||
+ | | ||
+ | slow serial devices at the same time. Unfortunately detecting | ||
+ | | ||
+ | would have to be duplicated and then the loader would become too | ||
+ | | ||
+ | | ||
+ | |||
+ | A fastloader version is available for both CIA1 (asm, exe) and CIA2 | ||
+ | (asm, exe) versions, uuencoded versions are attached to this article. | ||
+ | Only the CIA1 version is discussed here. | ||
+ | |||
+ | ; DASM V2.12.04 source | ||
+ | ; | ||
+ | ; Burst loader routine, minimal version to allow loading of programs upto 63k | ||
+ | ; in length ($400-$ffff). Directory is loaded with the normal load routine. | ||
+ | ; | ||
+ | ; (c)1987-98 Pasi Ojala, Use where you want, but please give me some credit | ||
+ | ; | ||
+ | ; This program needs SRQ to be connected to CNT1 and DATA to SP1 (CIA1). | ||
+ | ; Cassette drive won't work with those wires connected if the disk drive | ||
+ | ; is turned on. (SRQ is connected to cassette read line.) | ||
+ | ; | ||
+ | ; SRQ = Bidirectional fast clock line for fast serial bus | ||
+ | ; DATA= Slow/Fast serial data (software clocked in slow mode) | ||
+ | ; | ||
+ | ; In C128D (64-mode) you should use CIA2, because it has special hardware | ||
+ | ; which inhibits the use of CIA1 (or so I'm told). | ||
+ | ; | ||
+ | ; A short description of the burst protocol and commands can be found | ||
+ | ; from the "1581 Disk Drive User's Guide" | ||
+ | |||
+ | processor 6502 | ||
+ | |||
+ | ORG $0801 | ||
+ | DC.B $b,8,$ef,0 ; '239 SYS2061' | ||
+ | DC.B $9e, | ||
+ | DC.B 0,0,0 | ||
+ | |||
+ | install: | ||
+ | ; copy first block to $2a7..$2ff | ||
+ | ldx # | ||
+ | 0$ lda block1,x | ||
+ | sta _block1,x | ||
+ | dex | ||
+ | bpl 0$ | ||
+ | ; copy second block to $334..$3ff | ||
+ | ldx # | ||
+ | 1$ lda block2-1,x | ||
+ | sta _block2-1,x | ||
+ | dex | ||
+ | bne 1$ | ||
+ | |||
+ | lda $0330 ; load vector | ||
+ | ldx $0331 | ||
+ | cmp #MyLoad | ||
+ | beq 3$ | ||
+ | 2$ sta OldVrfy+1 | ||
+ | stx OldVrfy+2 | ||
+ | lda #MyLoad | ||
+ | sta $0331 | ||
+ | 3$ rts | ||
+ | |||
+ | block1 | ||
+ | #rorg $02a7 | ||
+ | _block1 | ||
+ | OldLoad lda #0 | ||
+ | OldVrfy jmp $f4a5 ; The ' | ||
+ | |||
+ | MyLoad: ;sta $93 | ||
+ | cmp #0 ; Is it a prg-load-operation ? | ||
+ | bne OldVrfy | ||
+ | stx $ae ; Store the load address | ||
+ | sty $af | ||
+ | tay ; ldy #0 | ||
+ | lda ($bb), | ||
+ | ldy $af | ||
+ | cmp #$24 ; Do we want a directory ($) ? | ||
+ | beq OldLoad | ||
+ | cmp #58 ; ':' | ||
+ | beq OldLoad | ||
+ | |||
+ | ; Activate Burst, the drive then knows we can handle it | ||
+ | sei ; We are polling the serial reg. intr. bit | ||
+ | ldy #1 ; Set the clock rate to the fastest possible | ||
+ | sty $dc04 | ||
+ | dey ; = ldy #0 | ||
+ | sty $dc05 | ||
+ | lda #$c1 | ||
+ | sta $dc0e ; Start TimerA, Serial Out, TOD 50Hz | ||
+ | bit $dc0d ; Clear interrupt register | ||
+ | lda #8 ; Data to be sent, and interrupt mask | ||
+ | sta $dc0c ; (actually we just wake up the other end, | ||
+ | 0$ bit $dc0d ; | ||
+ | ; burst transfers, data can be anything) | ||
+ | beq 0$ ; Then we poll the serial (data sent) | ||
+ | ; Clears the interrupt status | ||
+ | |||
+ | ; This program assumes you don't try to use it on a 1541 | ||
+ | ; If you try anyway, your machine will probably lock up.. | ||
+ | |||
+ | lda #$25 ; Set the normal (PAL) frequence to TimerA | ||
+ | sta $dc04 ; Change if you want to preserve NTSC-rate | ||
+ | lda #$40 | ||
+ | sta $dc05 | ||
+ | lda #$81 | ||
+ | jmp LoadFile | ||
+ | |||
+ | GetByte lda #8 ; Interrupt mask for Serial Port | ||
+ | 0$ bit $dc0d ; Wait for a byte | ||
+ | beq 0$ ; (Serial port int. bit changes, hopefully) | ||
+ | ;ldy $dc0c ; Get the byte from Serial Port Register | ||
+ | ToggleClk: | ||
+ | lda $dd00 ; Toggle the old serial clock (=send Ack) | ||
+ | eor #$10 ; so that the disk drive will start | ||
+ | sta $dd00 ; | ||
+ | ;tya ; return the value in Accumulator, | ||
+ | lda $dc0c ; Get the byte from Serial Port Register | ||
+ | rts | ||
+ | #rend | ||
+ | block1_end | ||
+ | |||
+ | |||
+ | block2 | ||
+ | #rorg $0334 | ||
+ | _block2 | ||
+ | |||
+ | LoadFile: | ||
+ | sta $dc0e ; Start TimerA, Serial IN, TOD 50Hz (PAL) | ||
+ | ;cli | ||
+ | |||
+ | jsr $f5af ; searching for .. | ||
+ | |||
+ | lda $b7 ; Preserve the filename length | ||
+ | pha | ||
+ | lda $b9 ; Do the same with secondary address | ||
+ | sta $a5 ; We store it to cassette sync countdown.. | ||
+ | ; No cassette routines are used anyway, as | ||
+ | lda #0 ; this prg is in cassette buffer.. | ||
+ | sta $b7 ; No filename for command channel | ||
+ | lda #15 | ||
+ | sta $b9 ; Secondary address 15 == command channel | ||
+ | lda #239 | ||
+ | sta $b8 ; Logical file number (15 might be in use?) | ||
+ | jsr $ffc0 ; OPEN | ||
+ | sta ErrNo+1 | ||
+ | pla | ||
+ | sta $b7 ; Restore filename length | ||
+ | bcs ErrNo ; " | ||
+ | ; "too many open files" or "file already open" | ||
+ | ; Send Burst command for Fastload | ||
+ | ldx #239 | ||
+ | jsr $ffc9 ; CHKOUT Set command channel as output | ||
+ | sta ErrNo+1 | ||
+ | bcs NoDev ; " | ||
+ | |||
+ | ; Bummer, the interrupt status register bit indicating fast serial | ||
+ | ; will be cleared when we get here.. | ||
+ | |||
+ | ldy #3 | ||
+ | 3$ lda BCMD-1, | ||
+ | jsr $ffd2 | ||
+ | dey | ||
+ | bne 3$ | ||
+ | ; ldy #0 | ||
+ | 1$ lda ($bb),y | ||
+ | jsr $ffd2 ; Send the filename byte by byte | ||
+ | iny | ||
+ | cpy $b7 ; Length of filename | ||
+ | bne 1$ | ||
+ | jsr $ffcc ; Clear channels | ||
+ | |||
+ | sei | ||
+ | jsr $ee85 ; Set serial clock on == clk line low | ||
+ | bit $dc0d ; Clear intr. register | ||
+ | jsr ToggleClk | ||
+ | |||
+ | jsr HandleStat | ||
+ | pha ; Store the Status | ||
+ | |||
+ | ;jsr $f5d2 ; loading/ | ||
+ | ; (uses CHROUT, which does CLI, so we can't use it) | ||
+ | |||
+ | ; We could add a check here.. | ||
+ | ; if we don't have at least two bytes, we cannot read load address.. | ||
+ | |||
+ | ; It seems that for files shorter than 252 bytes the 1581 does not count | ||
+ | ; the loading address into the block size. | ||
+ | |||
+ | jsr GetByte | ||
+ | ; that every file is at least 2 bytes long | ||
+ | tax | ||
+ | jsr GetByte | ||
+ | tay ; already in Y | ||
+ | lda $a5 ; The secondary address - do we use load | ||
+ | ; address in the file or the one given to | ||
+ | bne Our ; | ||
+ | stx $ae ; We use file's load addr. -> store it. | ||
+ | sty $af | ||
+ | Our ldx #252 ; We have 252 bytes left in this block | ||
+ | pla ; Restore the Status | ||
+ | bne Last ; If not OK, it has to be bytes left | ||
+ | Loop jsr GetAndStore ; Get X bytes and save them | ||
+ | jsr HandleStat | ||
+ | beq Loop ; If all was OK, loop.. | ||
+ | Last tax ; Otherwise it is bytes left. Do the last.. | ||
+ | jsr GetAndStore ; Get X number of bytes and save them | ||
+ | jsr $ee85 ; Serial clock on (the normal value) | ||
+ | lda #239 | ||
+ | jsr $ffc3 ; Close the command channel | ||
+ | clc ; carry clear -> no error indicator | ||
+ | bcc End | ||
+ | |||
+ | FileNotFound: | ||
+ | pla ; Pop the return address | ||
+ | pla | ||
+ | jsr $ee85 ; Serial clock on (the normal value) | ||
+ | lda #4 ; File not found | ||
+ | sta ErrNo+1 | ||
+ | NoDev lda #239 | ||
+ | jsr $ffc3 ; Close the command channel | ||
+ | ErrNo lda #5 ; Device not present | ||
+ | sec ; carry set -> error indicator | ||
+ | End ldx $ae ; Loader returns the end address, | ||
+ | ldy $af ; | ||
+ | cli | ||
+ | rts ; Return from the loader | ||
+ | |||
+ | HandleStat: | ||
+ | jsr GetByte | ||
+ | ; transfer for next byte) | ||
+ | cmp #$1f ; EOI ? | ||
+ | bne 0$ | ||
+ | jmp GetByte | ||
+ | 0$ cmp #2 ; File Not Found ? | ||
+ | bcs FileNotFound | ||
+ | ; code 0 or 1 -> OK | ||
+ | ldx #254 ; So, the whole block is coming | ||
+ | lda #0 ; No error -> Z set | ||
+ | rts | ||
+ | |||
+ | GetAndStore: | ||
+ | jsr GetByte | ||
+ | ;sta $d020 | ||
+ | ldy #$34 | ||
+ | sty 1 ; ROMs/IO off (hopefully no NMI:s occur..) | ||
+ | ldy #0 | ||
+ | sta ($ae), | ||
+ | ldy #$37 | ||
+ | sty 1 ; Restore ROMs/IO (Should preserve the | ||
+ | ; state, but here it doesn' | ||
+ | inc $ae ; Increase the address | ||
+ | bne 0$ | ||
+ | inc $af | ||
+ | 0$ dex ; X= number of bytes to receive | ||
+ | bne GetAndStore | ||
+ | rts | ||
+ | |||
+ | BCMD: dc.b $1f, $30, $55 ; ' | ||
+ | ; If $9F, Doesn' | ||
+ | #rend | ||
+ | block2_end | ||
+ | |||
+ | Now that was it. Now I just hold back and wait until someone | ||
+ | | ||
+ | to.. :-) | ||
+ | | ||
+ | begin 644 burster-cia1 | ||
+ | M`0@+" | ||
+ | M!.`" | ||
+ | M.O# | ||
+ | M>"" | ||
+ | MWP, | ||
+ | G[`+)`K#: | ||
+ | `` | ||
+ | end | ||
+ | size 354 | ||
+ | |||
+ | begin 644 burster-cia2 | ||
+ | M`0@+" | ||
+ | M!.`" | ||
+ | M.O# | ||
+ | M$(T`W: | ||
+ | M: | ||
+ | =8"# | ||
+ | `` | ||
+ | end | ||
+ | size 344 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | 8000's USER PORT & CENTRONICS PRINTERS | ||
+ | |||
+ | by Ken Ross | ||
+ | petlibrary@bigfoot.com | ||
+ | http:// | ||
+ | |||
+ | A recent query had me digging out an old item dealing with the user port on | ||
+ | the CBM/ | ||
+ | parallel printer with just the addition of a home brew cable (a Panasonic | ||
+ | Daisy Wheel printer salvaged before bin men got it!). The user port is | ||
+ | the edge connection tween the IEEE edge and the cassette# | ||
+ | is mostly diagnostic, the underside is the easy to use area. It's an I/O | ||
+ | (Input/ Output) system that you can control with a few PEEKs and POKEs. | ||
+ | Reading from left to right (as you look at the back of the beastie): | ||
+ | |||
+ | A _ ground | ||
+ | B _ input to 6522 VIA, CA1 | ||
+ | C D E F G H J K L _ are I/O lines ( 8 of them ) , PA0-7 [ data lines ] | ||
+ | M _ CB2 line from VIA can be I/O | ||
+ | N _ ground | ||
+ | |||
+ | A text file to be printed out can be read a character at a time with | ||
+ | MID$(etc) for this PRG to deal with and quite high speeds can be reached | ||
+ | even without having to compile it . | ||
+ | |||
+ | (This is actually a section of listing just printed out from my 8096 - | ||
+ | hence untidy numbers ) | ||
+ | |||
+ | 3010 POKE 59459, 255:REM make PA0-7 into outputs | ||
+ | 3020 POKE 59467, | ||
+ | 3022 RETURN :REM finished with this sub | ||
+ | [this enables the user port for this purpose] | ||
+ | 3023 REM this sub puts the data into output | ||
+ | 3024 if DATA <32 then goto 3080 :REM line does biz for LF & CR | ||
+ | 3026 if DATA =>65 and DATA<= 90 then DATA=DATA +32 : goto 3029 | ||
+ | | ||
+ | 3027 if DATA =>193 and DATA<= 218 then DATA=DATA -128 :goto 3029 | ||
+ | | ||
+ | ascii 65-90] | ||
+ | | ||
+ | 3029 REM line below sets strobe low to inform printer new data character on | ||
+ | way | ||
+ | 3031 POKE 59468, PEEK(59468) AND 31 OR 192 | ||
+ | 3035 REM below sets strobe high as data arrives | ||
+ | 3045 POKE 59468, | ||
+ | 3050 POKE 59471, DATA:REM at last data is POKE'd !!! | ||
+ | [the data numbers from above] | ||
+ | 3060 POKE 59468, | ||
+ | 3065 REM handshake sub | ||
+ | 3066 POKE 59467, PEEK(59467) OR 1 | ||
+ | 3067 WAIT 59469,2 | ||
+ | 3068 K=PEEK(59457) | ||
+ | 3069 REM end of handshake sub | ||
+ | [well it works for me!!] | ||
+ | 3070 RETURN :REM back to main area for next data | ||
+ | 3080 REM bit for LF & CR sub & return | ||
+ | [this depends on the printer and the same procedure for paper eject | ||
+ | if needed] | ||
+ | |||
+ | The cable connections are | ||
+ | CBM - CENTRONICS | ||
+ | CB2 - DATA STROBE | ||
+ | PA0~7 - DATA1-8 | ||
+ | CA1 - ACKNOWLEDGE | ||
+ | GND - grounds #14, 16, 24, 33, chassis gnd 17 | ||
+ | |||
+ | More modern printers will also need additional commands to enable things. | ||
+ | The commands needed for Epson printers ( with the exception list of | ||
+ | Epsons that don't use them !) are on my website at : | ||
+ | |||
+ | http:// | ||
+ | |||
+ | If any more info turns up it'll be there in time . | ||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H 19 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | Main Articles | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | |||
+ | ------------------------------------------------------------ | ||
+ | | Sex, lies and microkernel based 65816 native OSes. - Part 1| | ||
+ | ------------------------------------------------------------ | ||
+ | By Jolse Maginnis | ||
+ | |||
+ | Some readers may have read my article in GO64 issue 8/1999, which was a bit of | ||
+ | an introduction to JOS and some Operating System concepts, but it wasn't very | ||
+ | technical, and didn't really get into the nitty gritty. Getting down and dirty | ||
+ | with the bits and bytes is what C-Hacking is all about, so that's what this | ||
+ | series of articles will try to do wherever possible. | ||
+ | |||
+ | I'll try to go into detail about modern OS designs, paying particular detail to | ||
+ | what is relevant to the C64/ | ||
+ | and make comparisons to the kind of coding most of us are used to, e.g. just | ||
+ | using the kernel to access hardware, or just skipping the kernel altogether. | ||
+ | Most of the article will be in reference to the SuperCPU, specifically it's | ||
+ | 65816 CPU, and the OS I'm making for it, called JOS. If you haven' | ||
+ | SuperCPU yet, hopefully you'll want one by the end! (Remember it won't stop you | ||
+ | running stock programs!) | ||
+ | |||
+ | ------------------------------------------------- | ||
+ | | OK, So what do you plan to do.. And why bother? | | ||
+ | ------------------------------------------------- | ||
+ | |||
+ | When I first heard about the SuperCPU, I got pretty excited. " | ||
+ | times faster! 16Mbs! That's 256 times more RAM! I can only imagine what it's | ||
+ | capable of!", well I didn't actually say those things, but I at least thought | ||
+ | them! At the time I had already started making an OS for the C64, and at the | ||
+ | time I didn't know much at all about making an OS, all I knew about was | ||
+ | multitasking, | ||
+ | I managed to get myself a SuperCPU and make an OS on that, and to my surprise, | ||
+ | at that time, there didn't seem to be anyone else developing an OS for the | ||
+ | SuperCPU. | ||
+ | |||
+ | Only when the SCPU arrived and I had started coding for it, did I realise how | ||
+ | powerful it was. Yeah it's 20 times faster in clock speed, but it's also a 16 | ||
+ | bit processor, which might not seem like a great step up, but once you start | ||
+ | coding in 16 bits, it's hard to see how you did without it! | ||
+ | |||
+ | The 65816 has some great advantages over the 6502: | ||
+ | It's stack pointer is not limited to 256 bytes. | ||
+ | The Zero Page isn't stuck in the zero page! (It's now called the Direct Page). | ||
+ | There are a few more ways to put values on the stack. | ||
+ | Long addressing allowing upto 16mb directly accessible memory. | ||
+ | Plenty more.. | ||
+ | |||
+ | The top three things in particular, together with the 16 bit wide registers | ||
+ | means it's very suited to programming in a high level language like C, | ||
+ | particularly when compared to code that has to be produced for 6502. Higher | ||
+ | level languages can actually use the real CPU stack rather than having to | ||
+ | simulate it, as with 6502. Also by moving the Direct Page register, local | ||
+ | variables can be accessed like zero page variables, so performance isn't hurt | ||
+ | too much. | ||
+ | |||
+ | All this would be good even at a lower speed like 1 or 2Mhzs, but it's at 20! | ||
+ | The SuperCPU adds some real power to your old C64, but it's all hidden away | ||
+ | because we're running a ~20 year old " | ||
+ | |||
+ | The C64 has many limitations, | ||
+ | CBM serial bus. Here's a list of the main limits: | ||
+ | |||
+ | Single Tasking - Running two seperate programs at the same time impossible. | ||
+ | |||
+ | Some devices aren't catered for - Some devices don't have a chance at running | ||
+ | with old programs that were designed before their time. | ||
+ | |||
+ | Old sequential filesystem - It's not designed for random access files, although | ||
+ | random access is possible, it's just slower. All C64 programs have to written, | ||
+ | so that files are read from the beginning to the end, which is a little bit | ||
+ | limiting. Also it's the drives that dictate the filesystem, so we aren't just | ||
+ | stuck with the kernel' | ||
+ | several files open on many drives, while reading and writing to all of them just | ||
+ | isn't a possibility. Why would you want to do that? If you we're multitasking | ||
+ | several programs, that's just might be what happens! | ||
+ | |||
+ | It became pretty clear that the C64's kernel was of no use to JOS, since it had | ||
+ | too many limitations. So everything had to be re-written from scratch, with the | ||
+ | limits removed. | ||
+ | |||
+ | Along with re-doing the filesystem and adding multitasking, | ||
+ | plans for JOS: | ||
+ | |||
+ | Networking - Everything is internet, internet, internet these days, and why not, | ||
+ | the internet is great! So TCP/IP and SLIP/PPP were high on the list of TODO' | ||
+ | |||
+ | GUI - The SuperCPU is ideal for a nice, flexible, easy to program GUI. | ||
+ | |||
+ | Console - I wanted the console to be as close as possible to one of the standard | ||
+ | terminals (vt100,ansi etc..) thus making it easy to get by without needing a | ||
+ | terminal emulation program. | ||
+ | |||
+ | Shared libraries & shared code, relocatable binary format - Sharing as much code | ||
+ | as possible really saves memory and loading time. The binary format means that | ||
+ | you don't have to worry about where in memory your program will be. | ||
+ | |||
+ | Modular and scalable - It's nice to be able to choose exactly what your OS | ||
+ | needs, rather than getting lumped with it all. E.g. Do you really need tcp/ip | ||
+ | loaded if your not going to use the internet? If i'm running a webserver, do I | ||
+ | really need the console driver loaded? | ||
+ | |||
+ | Device independence - Application should not have to worry at all about what | ||
+ | devices they are using, which means that they' | ||
+ | including new ones. This is particularly useful when it comes to disk drives and | ||
+ | filesystems. | ||
+ | |||
+ | Porting and writing C programs - Wouldn' | ||
+ | advantage of the Open Source movement that's sweeping the world, and compile | ||
+ | some of these open source programs? | ||
+ | |||
+ | OK, so why am I bothering? At first I just wanted to see what I could do with | ||
+ | it, but now that it's come so far, it's not only of interest to me, as it's | ||
+ | become a very powerful OS. | ||
+ | |||
+ | ------------------------------------------------- | ||
+ | | ||
+ | ------------------------------------------------- | ||
+ | |||
+ | Unless you've been living on a remote desert island for the last 5 years, you'll | ||
+ | know about the terrible trend in personal computing these days; buy a new PC now | ||
+ | and in 6 months or less it's outdated. As CBM users, we successfully avoid all | ||
+ | this. Sure, CMD have tonnes of upgrades available, but they' | ||
+ | lifetime" | ||
+ | |||
+ | Have you ever thought about why PC's become outdated so quickly? It's very | ||
+ | popular to blame Microsoft (and I will!), since they are the main proponent of | ||
+ | bloat with their ever expanding OSes and applications, | ||
+ | accepted now that it's ok to leave things unoptimized, | ||
+ | more " | ||
+ | how much time is spent " | ||
+ | components taking up so much RAM. For me, it's all about layers. It's what | ||
+ | separates C64's from the bloated world of the PC. Here's my comparisons... | ||
+ | |||
+ | |||
+ | CPU Type | ||
+ | -------- | ||
+ | PC - 32 bit processors | ||
+ | C64/ | ||
+ | |||
+ | This is quite arguable, but when most of your code doesn' | ||
+ | over 32768, 32 bit's can be a bit wasteful, but of course if you need to do 32 | ||
+ | bit arithmetic on an 8 or 16 bit processor, that too is wasteful. For me a 16 | ||
+ | bit processor is the ideal size, particularly after doing lots of 8 bit coding. | ||
+ | |||
+ | Language used | ||
+ | ------------- | ||
+ | PC - Mainly C, C++ | ||
+ | C64/ | ||
+ | |||
+ | C can be a thin layer or a thick layer, depending on the processor. On 6502 it's | ||
+ | quite a thick layer, which is why most things for C64 were written in ASM. On | ||
+ | 65816, that layer isn't so thick, so it's a much more viable alternative. | ||
+ | Although, when you write in a higher level language, you tend to forget about | ||
+ | the actual code it produces, and don't bother optimizing it. C++ adds another | ||
+ | layer onto C, not only because of the code it produces, but the style of | ||
+ | program. Good object oriented programming practice adds extra bloat, because | ||
+ | there is more emphasis on doing function calls, to do things that ordinarily are | ||
+ | done by directly accessing the data. The real bloat of Object Orientation isn't | ||
+ | actually the code that you write yourself, you can still write optomized code in | ||
+ | an OO language, but the bloat is in the libraries of objects that you use when | ||
+ | writing your application, | ||
+ | example. | ||
+ | |||
+ | OS type | ||
+ | ------- | ||
+ | |||
+ | PC - Multitasking OS | ||
+ | C64/ | ||
+ | |||
+ | A multitasking OS adds some layers by default, since it has to switch between | ||
+ | processes. The OS isn't just the task switcher however, it's everything that's | ||
+ | needed to run applications, | ||
+ | opinion, absolutely none or as little as possible of the OS should be written in | ||
+ | a high level language, since it's going to be used by every application, | ||
+ | want frequently used things to be as optimized as possible. Most definitely the | ||
+ | most useful task an OS can provide is doing all the Disk I/O. Unfortunately for | ||
+ | us, the C64's kernel and CBM's serial bus are no where near fast enough, so | ||
+ | coders made their own DOS routines. | ||
+ | |||
+ | User Interface | ||
+ | -------------- | ||
+ | |||
+ | PC - Windows, X Windows | ||
+ | C64 - BASIC, GEOS | ||
+ | |||
+ | Windows and X are the most popular GUI's going around. X doesn' | ||
+ | standards on applications, | ||
+ | want, and usually do! When you have a few different applications running, each | ||
+ | with it's own GUI toolkit, you soon run out of memory, particularly if they' | ||
+ | big bloated C++ toolkits. Windows isn't quite the same, you at least have a | ||
+ | consistent look and feel, which also adds up to less memory wastage because most | ||
+ | apps use the same code. GEOS is nice looking but isn't very flexible at all, but | ||
+ | this does mean that it's a very thin layer. My hope is to achieve a balance | ||
+ | between the two. | ||
+ | |||
+ | So why'd I bother with all that? Well I just want to hilight that JOS will be | ||
+ | taking all those things into account, and I want to minimize the amount and size | ||
+ | of layers being added to our beloved C64's. | ||
+ | |||
+ | |||
+ | ------------------------------------------------- | ||
+ | | Monolithic or Micro? How do we want our kernel? | | ||
+ | ------------------------------------------------- | ||
+ | |||
+ | There are two main styles of OSes doing the rounds at the moment, both with | ||
+ | their own good and bad points. | ||
+ | |||
+ | Monolithic kernels | ||
+ | ------------------ | ||
+ | |||
+ | These, as the name suggests, are one large monolith of code, which usually | ||
+ | contain driver code for all devices. You would definitely consider the C64's | ||
+ | kernel as a monolithic kernel. Multitasking kernels sometimes allow | ||
+ | modularization, | ||
+ | allowing parts of the OS to be dynamically loaded. Linux is a very popular | ||
+ | example of this. It's a monolithic kernel which allows kernel modules to be | ||
+ | loaded dynamically. Last time I checked Lunix Next Generation worked along these | ||
+ | lines. | ||
+ | |||
+ | Good - Generally a little faster than Microkernels, | ||
+ | taken to switch processes is slow. | ||
+ | | ||
+ | Bad - Not as scalable as a Microkernel. You get everything in a big chunk, | ||
+ | whether you need it or not. | ||
+ | | ||
+ | How - Generally applications need to make calls to a jump table, which | ||
+ | usually will point to routines for Opening, Closing, Reading and | ||
+ | Writing devices. | ||
+ | | ||
+ | e.g. | ||
+ | lda #' | ||
+ | jsr $ffd2 | ||
+ | Prints ' | ||
+ | | ||
+ | |||
+ | Microkernel | ||
+ | ----------- | ||
+ | |||
+ | Microkernels truly are micro in size, if they' | ||
+ | lump all the device driver and API code in together, Microkernels only provide | ||
+ | very simple services for setting up processes and allowing them to communicate | ||
+ | with each other. All the device drivers and file-systems are then supplied by | ||
+ | optional programs that are loaded dynamically at run time. This allows maximum | ||
+ | scalability, | ||
+ | need. The best example other than JOS would be QNX (http:// | ||
+ | based Microkernel OS, which is extremely scalable and very small in code size. | ||
+ | On 6502/C64, OS/A65 is another Microkernel OS. | ||
+ | |||
+ | Microkernel OSes rely heavily on fast Inter Process Communication (IPC). Luckily | ||
+ | this is quite easy to achieve on 65816, and is basically a matter of passing | ||
+ | pointers between processes. | ||
+ | |||
+ | Good - Extremely scalable. Nicely split up into easy managle parts. Easier to | ||
+ | debug. I chose a Microkernel in JOS for these reasons. | ||
+ | |||
+ | Bad - Can be slower if too much time is spent switching between processes. | ||
+ | |||
+ | How - A jump table is still used, but to actually do any I/O you need to | ||
+ | communicate with the server process via IPC. | ||
+ | | ||
+ | To do this in JOS it involves setting up a message somewhere in memory | ||
+ | and then calling the S_send system call, to send to the server | ||
+ | process. Usually the message will be put on the stack and then popped | ||
+ | off when returned, much like a C function call. | ||
+ | | ||
+ | e.g. to open the file " | ||
+ | | ||
+ | pea O_READ | ||
+ | pea ^hellostr | ||
+ | pea !hellostr | ||
+ | pea IO_OPEN | ||
+ | tsc | ||
+ | inc | ||
+ | tax ; Low word of Message = Stack+1 | ||
+ | ldy #0 ; Stack is in Bank 0 | ||
+ | lda # | ||
+ | jsr @S_send | ||
+ | tsc | ||
+ | clc | ||
+ | adc #8 | ||
+ | tcs | ||
+ | |||
+ | hellostr .asc " | ||
+ | |||
+ | note: These are 65816 instructions, | ||
+ | you better look them up! The ' | ||
+ | addressing, ' | ||
+ | is used as the bottom 16 bits. Note that pea is a 16-bit instruction, | ||
+ | so pea ^hellostr will add an extra 00 byte. | ||
+ | |||
+ | The first 4 pea's prepare an 8 byte filesystem message, containing: | ||
+ | Message code for an Open: | ||
+ | 24 bit Pointer to Filename: | ||
+ | Open flags for reading: | ||
+ | |||
+ | This message is passed to the filesystem using one of JOS's Inter | ||
+ | Process Communication (IPC) system calls, S_send. This call takes the | ||
+ | 24 bit address of the message in X/Y, and the IPC channel for which to | ||
+ | send the message to, in the A register. Every system call in JOS | ||
+ | assumes 16 bit A/X/Y registers, as there really isn't anything to be | ||
+ | gained by switching to 8 bits for things that only need 8 bits. Adding | ||
+ | 8 to the stack pointer at the end " | ||
+ | stack. | ||
+ | |||
+ | This all looks a bit complicated doesn' | ||
+ | libraries help out. The standard C library for JOS allows you to do I/O | ||
+ | and such without actually worrying about the system calls. Yes it is a | ||
+ | " | ||
+ | |||
+ | pea O_READ ; same as the c code: open(" | ||
+ | pea ^hellostr | ||
+ | pea !hellostr | ||
+ | jsr @_open | ||
+ | pla | ||
+ | pla | ||
+ | pla | ||
+ | |||
+ | Much simpler right? | ||
+ | |||
+ | Compare that with the C64 kernel equivalent of: | ||
+ | |||
+ | lda #namelen | ||
+ | ldx #< | ||
+ | ldy #> | ||
+ | jsr $ffbd ; SETNAM | ||
+ | lda #1 | ||
+ | ldx #8 | ||
+ | ldy #1 | ||
+ | jsr $ffba ; SETLFS | ||
+ | jsr $ffc0 ; OPEN | ||
+ | |||
+ | Notice that the JOS version doesn' | ||
+ | anything.. I'll get to that later... | ||
+ | |||
+ | | ||
+ | | C isn't just the letter after B | | ||
+ | | ||
+ | |||
+ | Before I get into juicy OS details, I should explain about C and the standard C | ||
+ | library, as I'll be mentioning it quite a bit. | ||
+ | |||
+ | C is a very powerful language that was created by the same people who created | ||
+ | UNIX, so the two really go hand in hand. The majority of applications written | ||
+ | for UNIX type OSes are written in C; in fact, rather than give you executable | ||
+ | files, they are normally distrubuted as C source code, that you have to compile | ||
+ | yourself. Why is it used so much? Well if the only high level language you've | ||
+ | seen is BASIC, then you'd wonder how any high level language could be used for | ||
+ | good quality programs. C is different because it's just about as close as | ||
+ | you can get to programming in assembly without actually doing it, particularly | ||
+ | on newer processors. It isn't quite so pretty on 6502, but it's quite good on | ||
+ | the 65816. | ||
+ | |||
+ | In BASIC you're used to having "built in" commands that will print to the | ||
+ | screen, and commands for opening files and reading input, and any other I/O | ||
+ | you can think of. But C on the other hand, has nothing "built in", it doesn' | ||
+ | even have much of a notion of strings! Strings are just pointers to null | ||
+ | terminated arrays of characters in C. So how do you actually get C to do | ||
+ | anything useful? i.e. do some I/O? | ||
+ | |||
+ | This is where the C standard library comes in. This library contains functions | ||
+ | that deal with the underlying OS, and in particular opening/ | ||
+ | reading/ | ||
+ | memory, reading directories and various other useful functions. The standard | ||
+ | library also contains more UNIX orientated functions, for dealing with OS | ||
+ | features such as IPC and process control (more on processes later). | ||
+ | |||
+ | JOS implements a large section of the standard C library, in particular the | ||
+ | section that most command line applications will use. It does implement some of | ||
+ | the UNIX specific functions, but not in a compatible way, and programs that use | ||
+ | these functions are likely to be system applications that aren't useful for any | ||
+ | other system anyway. | ||
+ | |||
+ | Although it's called the standard ' | ||
+ | used in assembly language, in fact it's quite a bit easier to call the C | ||
+ | functions than to deal directly with the OS, and there is no speed penalty in | ||
+ | using the C library because it's been hand coded in assembly language anyway. | ||
+ | |||
+ | Would you like to see what it's like to code using the standard C library? I've | ||
+ | been talking about functions, and if you're familiar with C64 BASIC' | ||
+ | it's quite similar to that, except that you can pass more than one value to the | ||
+ | function. It's basically the same as writing subroutines in assembly, where we | ||
+ | usually pass values using the A,X & Y registers or a ZP value etc.. The only | ||
+ | difference is that ALL values are passed using the CPU stack, which is easily | ||
+ | accesible with the 65816. Ok let's take a look at the previous open file | ||
+ | example: | ||
+ | |||
+ | C code: file = open(" | ||
+ | |||
+ | 65816 assembly (16 bit regs): | ||
+ | |||
+ | pea O_READ | ||
+ | pea ^hellostr | ||
+ | pea !hellostr | ||
+ | jsr @_open ; C functions get " | ||
+ | | ||
+ | pla | ||
+ | pla | ||
+ | stx file ; store the result in file | ||
+ | sty file+2 | ||
+ | |||
+ | Notice that the values are placed onto the stack in reverse order, so they come | ||
+ | out in the correct order when the function accessing them. They are also long | ||
+ | jsr's because they aren't likely to be in the same bank as the calling program. | ||
+ | |||
+ | You might think that having to pop the values back off the stack is cumbersome, | ||
+ | and you're right. Why can't _open pop them off? Well it could, it'd need to do | ||
+ | some messing around with the stack at the end but it'd make things look nicer. | ||
+ | The reason it can't is because C functions don't always know how much data will | ||
+ | be on the stack, so they might pop the wrong amount off. It may look ugly, but | ||
+ | you get used to it. | ||
+ | |||
+ | Now I'll give you a bigger example of what C code looks like after it's been | ||
+ | compiled to prove that the 65816 is capable of producing half decent code. This | ||
+ | will probably only make sense if you've done C programming before, so if | ||
+ | you're not interested in this kind of thing skip this section.. | ||
+ | |||
+ | Here's a minimal version of the standard unix util ' | ||
+ | files together and sends them to the screen or whatever the stdout file is, as | ||
+ | it can be redirected in UNIX. | ||
+ | |||
+ | #include < | ||
+ | |||
+ | int main(int argc, char *argv[]) { | ||
+ | FILE *fp; | ||
+ | int ch=0; | ||
+ | int upto=1; | ||
+ | |||
+ | if (argc<2) { | ||
+ | fprintf(stderr," | ||
+ | exit(1); | ||
+ | } | ||
+ | argc--; | ||
+ | | ||
+ | fp = fopen(argv[upto++]," | ||
+ | if (!fp) { | ||
+ | perror(" | ||
+ | exit(1); | ||
+ | } | ||
+ | while((ch = fgetc(fp)) != EOF) | ||
+ | if (putchar(ch) == EOF) { | ||
+ | perror(" | ||
+ | exit(1); | ||
+ | } | ||
+ | | ||
+ | } | ||
+ | } | ||
+ | |||
+ | and here's the (unoptomized) compiled version: | ||
+ | |||
+ | |||
+ | #define _AS sep #$20:.as | ||
+ | #define _AL rep #$20:.al | ||
+ | #define _XS sep #$10:.xs | ||
+ | #define _XL rep #$10:.xl | ||
+ | #define _AXL rep # | ||
+ | #define _AXS sep # | ||
+ | |||
+ | .xl ; make sure it's 16 bit code | ||
+ | .al | ||
+ | |||
+ | .( | ||
+ | |||
+ | mreg = 1 | ||
+ | mreg2 = 5 | ||
+ | |||
+ | .text | ||
+ | |||
+ | +_main | ||
+ | -_main: | ||
+ | |||
+ | .( | ||
+ | |||
+ | RZ = 8 ; RZ = register size: Two psuedo 32 bit registers | ||
+ | LZ = 26 ; LZ = Local size: size of the local variables for this | ||
+ | ; function | ||
+ | |||
+ | phd | ||
+ | tsc /* make space for local variables */ | ||
+ | sec | ||
+ | sbc #LZ | ||
+ | tcs | ||
+ | tcd /* set up the DP register as the frame pointer */ | ||
+ | |||
+ | stz RZ+1 /* ch = 0; */ | ||
+ | |||
+ | lda #1 /* upto = 1; */ | ||
+ | sta RZ+7 | ||
+ | |||
+ | lda LZ+6 /* if (argc < 2) NOTE: could be just */ | ||
+ | .( /* cmp #2 : bpl L2 */ | ||
+ | cmp #2 /* but the compiler doesn' | ||
+ | bmi skip /* away L2 is. | ||
+ | brl L2 | ||
+ | skip .) | ||
+ | |||
+ | pea ^L4 /* fprintf(stderr," | ||
+ | pea !L4 | ||
+ | pea ^___stderr | ||
+ | pea !___stderr | ||
+ | jsr @_fprintf | ||
+ | tsc | ||
+ | clc | ||
+ | adc #8 | ||
+ | tcs | ||
+ | |||
+ | pea 1 /* exit(1) */ | ||
+ | jsr @_exit | ||
+ | pla | ||
+ | L2: | ||
+ | lda LZ+6 /* argc-- NOTE: dec LZ+6 would be better! */ | ||
+ | dec | ||
+ | sta LZ+6 | ||
+ | brl L6 | ||
+ | L5: | ||
+ | pea ^L8 /* This rather large bit of code is all for */ | ||
+ | pea !L8 /* fopen(argv[upto++]," | ||
+ | |||
+ | lda RZ+7 /* arrays don't translate so well! */ | ||
+ | sta RZ+9 | ||
+ | lda RZ+9 | ||
+ | inc | ||
+ | sta RZ+7 | ||
+ | ldx RZ+9 | ||
+ | lda #0 | ||
+ | .( | ||
+ | stx mreg2 | ||
+ | ldy #2 | ||
+ | beq skip | ||
+ | blah asl mreg2 | ||
+ | rol | ||
+ | dey | ||
+ | bne blah | ||
+ | skip ldx mreg2 | ||
+ | .) | ||
+ | clc | ||
+ | tay | ||
+ | txa | ||
+ | adc LZ+8 | ||
+ | tax | ||
+ | tya | ||
+ | adc LZ+8+2 | ||
+ | sta mreg2+2 | ||
+ | stx mreg2 | ||
+ | lda [mreg2] | ||
+ | tax | ||
+ | ldy #2 | ||
+ | lda [mreg2],y | ||
+ | pha | ||
+ | phx | ||
+ | jsr @_fopen | ||
+ | tsc | ||
+ | clc | ||
+ | adc #8 | ||
+ | tcs | ||
+ | stx RZ+11 | ||
+ | sty RZ+11+2 | ||
+ | |||
+ | ldx RZ+11 /* assign it to fp */ | ||
+ | lda RZ+11+2 | ||
+ | sta RZ+3+2 | ||
+ | stx RZ+3 | ||
+ | |||
+ | .( /* if (!fp) | ||
+ | lda RZ+3 | ||
+ | cmp #!0 | ||
+ | bne made | ||
+ | lda RZ+3+2 | ||
+ | cmp #^0 | ||
+ | beq skip | ||
+ | made brl L13 | ||
+ | skip .) | ||
+ | |||
+ | pea ^L11 /* perror(" | ||
+ | pea !L11 | ||
+ | jsr @_perror | ||
+ | pla | ||
+ | pla | ||
+ | |||
+ | pea 1 /* exit(1) */ | ||
+ | jsr @_exit | ||
+ | pla | ||
+ | |||
+ | brl L13 | ||
+ | L12: | ||
+ | pei (RZ+1) /* putchar(ch); | ||
+ | jsr @_putchar | ||
+ | pla | ||
+ | stx RZ+15 | ||
+ | |||
+ | lda RZ+15 /* if (putchar(ch) == EOF) | ||
+ | .( | ||
+ | cmp #-1 | ||
+ | beq skip | ||
+ | brl L15 | ||
+ | skip .) | ||
+ | |||
+ | pea ^L11 /* perror(" | ||
+ | pea !L11 | ||
+ | jsr @_perror | ||
+ | pla | ||
+ | pla | ||
+ | |||
+ | pea 1 /* exit(1) | ||
+ | jsr @_exit | ||
+ | pla | ||
+ | L15: | ||
+ | L13: | ||
+ | pei (RZ+3+2) /* fgetc(fp); */ | ||
+ | pei (RZ+3) | ||
+ | jsr @_fgetc | ||
+ | pla | ||
+ | pla | ||
+ | stx RZ+17 /* ch = fgetc(fp); */ | ||
+ | lda RZ+17 | ||
+ | sta RZ+1 | ||
+ | |||
+ | lda RZ+17 /* while ((ch = fgetc(fp)) != EOF) */ | ||
+ | .( | ||
+ | cmp #-1 | ||
+ | beq skip | ||
+ | brl L12 | ||
+ | skip .) | ||
+ | |||
+ | pei (RZ+3+2) /* fclose(fp); */ | ||
+ | pei (RZ+3) | ||
+ | jsr @_fclose | ||
+ | pla | ||
+ | pla | ||
+ | L6: | ||
+ | lda LZ+6 /* while(argc--) */ | ||
+ | sta RZ+9 | ||
+ | lda RZ+9 | ||
+ | dec | ||
+ | sta LZ+6 | ||
+ | lda RZ+9 | ||
+ | .( | ||
+ | cmp #0 | ||
+ | beq skip | ||
+ | brl L5 | ||
+ | skip .) | ||
+ | |||
+ | ldx #0 /* return from main() */ | ||
+ | L1: | ||
+ | tsc | ||
+ | clc | ||
+ | adc #LZ | ||
+ | tcs | ||
+ | pld | ||
+ | rtl | ||
+ | .) | ||
+ | |||
+ | .text | ||
+ | |||
+ | -L11 .asc " | ||
+ | -L8 .asc " | ||
+ | -L4 .asc " | ||
+ | .) | ||
+ | |||
+ | As you can see, there' | ||
+ | is concerned, but the code is still quite good. | ||
+ | |||
+ | Having a C compiler and a standard C library that contains the most used | ||
+ | standard functions, is going a long way towards being able to port UNIX's and | ||
+ | other similar environments' | ||
+ | backend for a free ANSI C compiler called LCC. | ||
+ | |||
+ | I'm no longer talking theory here either, since a little while ago I decided to | ||
+ | give my standard C library and the compiler a test on portability, | ||
+ | great results. I've managed to do extremely simple porting jobs on: Pasi's C | ||
+ | versions of his gunzip and puzip, Andre Fachat' | ||
+ | Marco Baye's ACME cross assembler. All of which, besides ACME, so far seem to be | ||
+ | working exactly how they should. There' | ||
+ | that could easily be ported to JOS, many of which wouldn' | ||
+ | anyone, but still! | ||
+ | |||
+ | | ||
+ | | Multitasking - Seeming to do it all at once.| | ||
+ | | ||
+ | |||
+ | We've all had experience with multitasking so I won't bore you too much. | ||
+ | For our purposes, it means being able to do several things at once. | ||
+ | |||
+ | But what actually is a " | ||
+ | usually call them processes, so that's what I'll refer to them as. | ||
+ | |||
+ | There are two main types of multitasking, | ||
+ | latter is as you would expect, processes need to co-operate together in order to | ||
+ | work, processes can't "do their own thing" | ||
+ | flexible approach, because processes don't need to explicitly hand over the | ||
+ | processor to another process, they just have it taken away from them if they use | ||
+ | it for too long. So it was a pretty easy choice for which kind of multitasking | ||
+ | JOS would have, pre-emptive of course! | ||
+ | |||
+ | You might think that the C64 already does multitasking because programs normally | ||
+ | set up interrupt routines to go off during the processing of the program, so it | ||
+ | can do more than one thing, but that's a very special case of what I'm | ||
+ | referring to here. I'm referring to the ability to run seperate unrelated | ||
+ | programs at the same time, like reading your email, and typing in a text | ||
+ | editor. We'd all like to be able to do that wouldn' | ||
+ | got the processing power and RAM to do it, and the SuperCPU certainly does. | ||
+ | |||
+ | Each process " | ||
+ | of the computer and OS like RAM, interrupts, kernel IPC objects, and some other | ||
+ | things. | ||
+ | |||
+ | Along with the resources it owns, each process has a number of attributes. First | ||
+ | of all it needs a unique identifier, so anything that wants to talk to it knows | ||
+ | how to address it. In Unix-like systems, this is called a Process IDentification | ||
+ | (PID). In JOS a PID is just a positive integer, simple. | ||
+ | |||
+ | Along with other processes being able to address it, the PID is used so that the | ||
+ | OS can keep track of which resources the process actually owns, and when it | ||
+ | exits (or is explicity killed) the OS can free up those things and let other | ||
+ | processes use them. | ||
+ | |||
+ | Processes can start other processes, so everything except the first process | ||
+ | keeps track of who its parent was in its Parent PID (PPID). You may wonder | ||
+ | what use it is to keep track of the parent? It's always been used in UNIX to set | ||
+ | up IPC, but it really isn't needed in JOS, apart from cosmetic purposes, since | ||
+ | JOS has better IPC mechanisms. That's the first example of "Just because it's in | ||
+ | UNIX doesn' | ||
+ | |||
+ | In JOS, a process can own multiple " | ||
+ | people' | ||
+ | |||
+ | Consider starting a C64 game, which has several different interrupt routines | ||
+ | running concurrently. We certainly wouldn' | ||
+ | be a seperate program, and that's generally the idea behind threads, except | ||
+ | threads are at the mercy of the pre-emptive scheduler. Almost the same result | ||
+ | can be achieved by creating multiple processes, but why go to the hassle of | ||
+ | loading and executing two tightly related processes with 1 thread each, when | ||
+ | you can do the same thing with 1 process that has 2 threads? A good example | ||
+ | of this is JOS's very own web server, which creates new threads whenever a | ||
+ | new connection has been established by a client. | ||
+ | |||
+ | Some new technologies are particularly keen on the use of threads, namely JAVA | ||
+ | and the BeOS. A good example of using multiple threads is given by BeOS, which | ||
+ | starts a seperate thread for every window displayed on the screen, so it can | ||
+ | update its on-screen appearance and remain responsive to the user, while also | ||
+ | doing other processing. | ||
+ | |||
+ | Unix programs have generally just started other processes if they wanted to do | ||
+ | two of their own things at once. Threads are much cleaner and nicer. Threads | ||
+ | themselves have their own attributes, such as priority (the higher the priority | ||
+ | the more processor time it's likely to get), state (whether they are running or | ||
+ | waiting for something), stack and zero page space, and some other things. | ||
+ | |||
+ | I know i've mentioned that JOS uses pre-emptive multitasking, | ||
+ | mean that doing: | ||
+ | jmp * | ||
+ | is a good idea! Programs should still try and co-operate. | ||
+ | |||
+ | A typical menu program on C64 using the kernel has a structure something like | ||
+ | this: | ||
+ | |||
+ | 1. Setup variables and interrupts | ||
+ | 2. Set up menu | ||
+ | 3. Check for input | ||
+ | 4. If no input go back to 3 | ||
+ | 5. Process input | ||
+ | |||
+ | If you were to run this program on a multitasking system, it would chew up a lot | ||
+ | of processing time and slow everything else down. Polling for input on a | ||
+ | multitasking system is generally a bad thing, but blocking and waiting for input | ||
+ | is a good thing. So instead it would be best to do: | ||
+ | |||
+ | 1. Setup variables and interrupts | ||
+ | 2. Set up menu | ||
+ | 3. Wait for input | ||
+ | 4. Process input | ||
+ | |||
+ | Now this is the correct way to do it, as it only uses up cpu time when it's | ||
+ | actually received some input. But what happens if every process is waiting? What | ||
+ | runs then? Well there is a special process that runs when no other processes | ||
+ | are, it's called the Idle process, and does what it's name suggests, just sits | ||
+ | there and idles. Here is the thread code that runs in my idle process: | ||
+ | |||
+ | nully jmp nully | ||
+ | |||
+ | For some reason I started calling it the Null process, and it's called that all | ||
+ | throughout JOS... | ||
+ | |||
+ | I have introduced you to a couple of the main ideas behind multitasking, | ||
+ | wouldn' | ||
+ | |||
+ | For starters, since it's pre-emptive multitasking, | ||
+ | interrupting the currently running process after it's consumed its alloted | ||
+ | time. The C64 has 4 CIA timers capable of producing IRQ's and NMI's, and in | ||
+ | JOS's case i've decided to let it use CIA 1 Timer A, which produces an IRQ. This | ||
+ | of course means that a process could stop itself from being interrupted by doing | ||
+ | an SEI, but if they behave well that won't happen! | ||
+ | |||
+ | Rather than set this timer to the amount of time before a process should be | ||
+ | pre-empted (called a " | ||
+ | counter, which is used for timing another kind of process resource: timers. | ||
+ | Timers can either count upwards, or downwards and give off an alarm. They really | ||
+ | need a higher precision than a timeslice, so they set the timer to 20 | ||
+ | milliseconds (about 1 PAL screen). The timeslice is then calcualted as 3 counts | ||
+ | of this timer i.e. 60 milliseconds. Why don't I use TIMER B for the system | ||
+ | timer? Well, because I want to leave as many resources open for application and | ||
+ | device driver processes. | ||
+ | |||
+ | I mentioned that processes and threads each have their own attributes, these | ||
+ | attributes are stored in Process Control Blocks (PCB' | ||
+ | Blocks (TCB' | ||
+ | |||
+ | Every process has a PCB, and every process has at least one thread, which has | ||
+ | it's own TCB. There is one process which is always loaded, and that's the Null | ||
+ | process. Each process' | ||
+ | structure, the circular (or double) linked lists. The Null PCB is always at the | ||
+ | head of the PCB list, and PCB's will only ever be on this one list, since they | ||
+ | are either alive (in the list), or dead (no PCB exists!). | ||
+ | |||
+ | Threads on the other hand can be in various states, but in particular they can | ||
+ | be ready for the CPU, or waiting for something (blocked). When a thread is | ||
+ | ready, it's just waiting for its turn at the CPU, and it goes on the Ready | ||
+ | list, which is a queue. The Null thread is ALWAYS at the back of this queue, so | ||
+ | it only gets to run if nothing else can. The ordering of this queue is up to a | ||
+ | part of the kernel called the Scheduler. | ||
+ | |||
+ | |||
+ | Front Back | ||
+ | ------------ | ||
+ | -- | Thread A | ----- | Thread B | ----- | Null Thread | -- | ||
+ | | ------------ | ||
+ | ------------------------------------------------------------------- | ||
+ | |||
+ | Some OSes have complex schedulers which take in many parameters, like priority | ||
+ | and various CPU time measurements. On multi-user OSes like UNIX, this is | ||
+ | important because it wants to be " | ||
+ | and many other OSes, it's usually a whole lot simpler than that, it's just a | ||
+ | simple matter of which ever process/ | ||
+ | two threads have the same priority, it normally comes down to "round robin" | ||
+ | scheduling, where they just take it turns. JOS doesn' | ||
+ | properly yet, because they actually don't make much difference to the normal | ||
+ | processing, at the moment it's just a simple round robin scheduler that doesn' | ||
+ | care about priorities. | ||
+ | |||
+ | What if a thread is blocked? It'll go onto a wait queue, and will return to the | ||
+ | ready queue only when it's ready to run. At this stage of JOS, the only thing a | ||
+ | thread will need to block for is IPC. | ||
+ | |||
+ | You may be wondering about the issue of relocatable code, as we all know the | ||
+ | 6502 nor the 65816 is designed for running relocatable code. Sure, branches are | ||
+ | relative to the PC, but nothing else is. So everything needs to be physically | ||
+ | relocated before executing, and to do this properly without needing to code in a | ||
+ | specific way, a relocatable binary format is needed. Fortunately for me, Andre | ||
+ | Fachat had already designed such a format for OS/A65, and it fits JOS nicely | ||
+ | because it includes 65816 extensions. Of course you need a special assembler to | ||
+ | output this file format, which is where XA comes in. XA now even compiles for | ||
+ | JOS, so self hosted development is now possible. The binary format will be | ||
+ | talked about in greater detail in a future article. | ||
+ | |||
+ | Well, it's all very fine having a bunch of processes running, but that's no | ||
+ | operating system.. Who's looking after the devices? Who's managing the memory? | ||
+ | And how do we ask the drivers to do something for us? It's all IPC... | ||
+ | |||
+ | -------------------------------------------------- | ||
+ | | Inter Process Communication - Let's get talking! | | ||
+ | -------------------------------------------------- | ||
+ | |||
+ | Before I get into the specifics of IPC, I should give an idea of what typically | ||
+ | happens when JOS boots. Because JOS has a very scalable microkernel design, it | ||
+ | can load as many different device drivers and applications at boot time as it | ||
+ | wants and infact they can loaded and removed anytime at all. So there is no one | ||
+ | bootup procedure in JOS. There are certain things that happen every time, | ||
+ | however. | ||
+ | |||
+ | For starters, JOS has 2 system processes, which are always started at bootup. | ||
+ | They aren't actually loaded off disk because they are part of the microkernel | ||
+ | code. One is the memory manager and the other is the process manager. | ||
+ | |||
+ | The memory manager as you would expect manages all the memory, but it doesn' | ||
+ | manage the Process space memory (Bank 0), that's the job of the Microkernel. | ||
+ | Process space memory (or kernel memory) is where all the PCB's, TCB's, Stack | ||
+ | space and Direct Page space is located. The Memory manager, manages all the | ||
+ | other RAM, e.g. Ram in Bank 1 and above, although, if there is no SuperRAM, it | ||
+ | allocates 00e000-010000 as system Ram instead of using it as kernel space RAM, | ||
+ | since it's more likely that you will run out of System Ram rather than kernel | ||
+ | RAM. | ||
+ | |||
+ | I won't go into the specifics of the Memory Manager just yet, I'll just tell you | ||
+ | that it performs the following requests: | ||
+ | |||
+ | Allocate any size block of RAM. | ||
+ | Free RAM. | ||
+ | Allocate any size block of Bank Aligned RAM. (Needed in some cases). | ||
+ | Reallocate RAM. | ||
+ | See how much RAM is left. | ||
+ | See what the largest block is. | ||
+ | |||
+ | All these things are requested via IPC, but there are Shared library routines | ||
+ | (such as malloc, free, realloc etc) for preparing the right IPC messages to | ||
+ | send. | ||
+ | |||
+ | The process manager' | ||
+ | libraries, and looking up device drivers & file-systems. Whenever you open a | ||
+ | file, you first must send a message to the process manager asking it where to | ||
+ | send the open message. | ||
+ | |||
+ | The very first process to start however is called the " | ||
+ | actually built into the microkernel, | ||
+ | 2 system processes, then it starts a simple Ramdisk process and loads another | ||
+ | process from the ramdisk called " | ||
+ | |||
+ | The " | ||
+ | also from the ramdisk, and " | ||
+ | this time called " | ||
+ | |||
+ | Note that " | ||
+ | should actually be " | ||
+ | be actually written to disk yet, even though the applications think they are. | ||
+ | I'm guessing this is why Macintoshes refuse to let you take a disk out without | ||
+ | the OSes permission! | ||
+ | |||
+ | The " | ||
+ | up most of the drivers. A shell script, if you've never heard of it, is a file | ||
+ | that has lists of commands to be run by the system, or more specifically the | ||
+ | shell program. If you've ever seen MS-DOS .bat files, you'll know what I mean. | ||
+ | |||
+ | A typical init script has to load a user interface, unless of course you're | ||
+ | using your machine as some kind of server, in which case you wouldn' | ||
+ | one and could save yourself a bit of memory! | ||
+ | |||
+ | The text based interface would require the console driver (con.drv), and the | ||
+ | shell (sh). The console driver is capable of 4 virtual consoles, which you can | ||
+ | switch between by pressing CBM and 1-4. This lets you exploit multitasking, | ||
+ | you could be running 4 different text apps on each of the screens. The shell is | ||
+ | a pretty basic shell at the moment (like DOS's command.com), | ||
+ | let you load and run any program. It also has support for pipes, but now I'm off | ||
+ | topic.. | ||
+ | |||
+ | The init script could instead load the GUI, which I'm sure most people would | ||
+ | prefer to a text based interface! | ||
+ | |||
+ | The script also should load other drivers like: tcp/ip, ppp, digi sound driver, | ||
+ | other filesystems, | ||
+ | is where Microkernels really excell over their monolithic counterparts. | ||
+ | |||
+ | Well that's what happens at boot time, but how do the drivers and the | ||
+ | applications communicate? | ||
+ | JOS's IPC is: message passing. Message passing is a fast and effective way to do | ||
+ | IPC, and for a microkernel this is essential. I chose message passing because | ||
+ | it's the most flexible method, and you can actually implement other types of IPC | ||
+ | by using message passing. | ||
+ | |||
+ | You can think of message passing as an extended subroutine call, but rather than | ||
+ | being a call to a subroutine, it's a call to another process. A process, or in | ||
+ | particular a thread, can " | ||
+ | " | ||
+ | |||
+ | You can't just send a message and expect it to be receieved straight away, the | ||
+ | receiver has to be ready to receive it, which may not be straight away. If the | ||
+ | receiver isn't ready, the thread that sent the message will block and wait until | ||
+ | it's ready. Once the receiver has received it, it processes the message, and | ||
+ | will issue a reply, which then unblocks the sender, which can then continue | ||
+ | processing. This type of message passing is called " | ||
+ | passing, as it requires synchronization between the two threads. It may help to | ||
+ | think of " | ||
+ | transferred to the routine, and " | ||
+ | more complicated than that, but essentially that's what it's like. | ||
+ | |||
+ | There is a great description of this kind of IPC at http:// | ||
+ | technical section, with diagrams and all -- highly recommended! | ||
+ | |||
+ | Normally, OSes have to copy messages between processes, because each process | ||
+ | gets its own address space, and can't view the memory of other processes, but | ||
+ | as we know, the 65816 doesn' | ||
+ | that messages don't need to be copied, which gives it a significant speed | ||
+ | increase over message passing in OSes with MMU's. Of course it does mean that | ||
+ | processes can accidently screw up another process' | ||
+ | |||
+ | All messages in JOS is directed at Channels. Channels are a resource that allow | ||
+ | threads to receive message from other threads. Generally device drivers register | ||
+ | a channel and use it to receive requests from applications. Channels are | ||
+ | referred to by number, the only channels that have fixed numbers are the memory | ||
+ | manager (0) and the process manager (1). All other channels are looked up by | ||
+ | sending a message to the process manager' | ||
+ | |||
+ | What exactly is a message? All the JOS system calls for IPC just deal with 24 | ||
+ | bit pointers to messages, and the actual message data itself can be anything! | ||
+ | However the first byte of the message should be the message code, and always is | ||
+ | in JOS system messages. You could of course make your own protocol for your own | ||
+ | IPC, but it's probably not a good idea. | ||
+ | |||
+ | Each different kind of driver has its own set of message codes.. | ||
+ | |||
+ | #define PROCMSG $80 | ||
+ | #define MEMMSG $40 | ||
+ | |||
+ | #define MMSG_Alloc 0+MEMMSG | ||
+ | #define MMSG_AllocBA 1+MEMMSG | ||
+ | #define MMSG_Free 2+MEMMSG | ||
+ | #define MMSG_Left 3+MEMMSG | ||
+ | #define MMSG_Large 4+MEMMSG | ||
+ | #define MMSG_LeftK 5+MEMMSG | ||
+ | #define MMSG_LargeK 6+MEMMSG | ||
+ | #define MMSG_KillMem 7+MEMMSG | ||
+ | #define MMSG_Realloc 8+MEMMSG | ||
+ | |||
+ | #define PMSG_Spawn PROCMSG+0 | ||
+ | #define PMSG_AddName PROCMSG+1 | ||
+ | #define PMSG_ParseFind PROCMSG+2 | ||
+ | #define PMSG_FindName PROCMSG+3 | ||
+ | #define PMSG_QueryName PROCMSG+4 | ||
+ | #define PMSG_Alarm PROCMSG+5 | ||
+ | #define PMSG_KillChan PROCMSG+6 | ||
+ | #define PMSG_WaitPID PROCMSG+7 | ||
+ | |||
+ | Those are the messages defined for the Process manager and Memory manager. Each | ||
+ | message code defines its own structure, for example the MMSG_Alloc message has | ||
+ | the structure: | ||
+ | |||
+ | .word MMSG_Alloc | ||
+ | .word !Size | ||
+ | .byte ^Size,0 | ||
+ | |||
+ | The message codes $e0-$ff are left for processes that want their threads to | ||
+ | communicate with each other. | ||
+ | |||
+ | Anything that wants to receive messages needs to have some code like this: | ||
+ | |||
+ | |||
+ | jsr @S_makeChan ; | ||
+ | sta Chan ; save it | ||
+ | |||
+ | loop lda Chan ; | ||
+ | jsr @S_recv ; receieve a message from channel | ||
+ | stx MsgP ; Save X/Y in MsgP | ||
+ | sty MsgP+2 ; MsgP is a zero page variable | ||
+ | sta RcvID ; Save RcvID - for replying | ||
+ | lda [MsgP] | ||
+ | and #$ff ; 8 bit message code | ||
+ | cmp #MSGCODE ; check which type | ||
+ | beq processMes ; | ||
+ | cmp #MSGCODE2 | ||
+ | beq processMes2 | ||
+ | ... | ||
+ | ldx #-1 ; replying with $ffff in X and Y | ||
+ | txy ; means " | ||
+ | lda RcvID | ||
+ | jsr @S_reply ; reply and loop back for more messages | ||
+ | bra loop | ||
+ | |||
+ | All device drivers have a message loop like that. Which forces them to be | ||
+ | modular, and thus easier to code. | ||
+ | |||
+ | Ok now let's see what sending a message would look like: | ||
+ | |||
+ | lda #PROC_CHAN | ||
+ | ldx #!Message | ||
+ | ldy #^Message | ||
+ | jsr @S_sendChan ; | ||
+ | ... | ||
+ | |||
+ | Message .word PMSG_WaitPID, | ||
+ | |||
+ | *note: it's generally a good idea to put messages on the stack, rather than use | ||
+ | global variables, since using the stack is thread safe. No other thread will | ||
+ | accidentally wipe over the message because they each have their own stack. | ||
+ | |||
+ | Just about everything that you consider an OS to be is done in JOS via IPC. | ||
+ | This includes file operations, such as opening and closing, reading and writing | ||
+ | files. How does the filesystem driver know which file you want to access after | ||
+ | you've opened it? It could include a connection number in the IO_READ and | ||
+ | IO_WRITE messages (you guessed it, the message codes for reading and writing!). | ||
+ | That's a little cumbersome, though. There is a better solution: connections. | ||
+ | |||
+ | What is a connection? It's a kernel object which keeps a track of the | ||
+ | destination channel of the messages directed at it. It also has an ID associated | ||
+ | with it, so server processes can tell which file, for example, it refers to. | ||
+ | Each process has a so called "file descriptor list" associated with it. People | ||
+ | who know much about UNIX programming will know about this. In JOS, this table is | ||
+ | really just a connection table. This table is just an array of connection | ||
+ | numbers, which the process can access. Each element in the array can point to | ||
+ | any connection number, which means that two file descriptors can actually point | ||
+ | to the same file, and in the case of the first three it usually does. The first | ||
+ | three are STDIN, STDOUT & STDERR, and they usually point to the screen, but not | ||
+ | always! | ||
+ | |||
+ | An example File Descriptor list: (0 = no connection) | ||
+ | |||
+ | | ||
+ | | ||
+ | | 1 | | ||
+ | | ||
+ | |||
+ | E.G. | ||
+ | Connection 1 is connected to the /dev/con/1 device (the screen). Thus STDIN, | ||
+ | STDOUT and STDERR all point to this. | ||
+ | Connection 2 is connected to a file "/ | ||
+ | Connection 3 is a tcpip connection to altavista.com. | ||
+ | |||
+ | Connections are global objects, and whenever a process is loaded, it inheritis | ||
+ | its file descriptor table from the parent, which is how it receives its STDIN, | ||
+ | STDOUT and STDERR. File descriptors can also be explicitly redirected to other | ||
+ | connections, | ||
+ | redirection. | ||
+ | |||
+ | I've discussed JOS's synchronous message passing, but what happens if you don't | ||
+ | want to block and wait for a reply? You might just want to notify a server that | ||
+ | an event has occurred, and don't need to know if it received it, nor what it | ||
+ | thinks about it. | ||
+ | |||
+ | In this case you can send a pulse. A pulse is a tiny message (just 4 bytes), | ||
+ | which doesn' | ||
+ | they can be sent during an interrupt. A good example of doing this is the | ||
+ | console driver, which implements virtual consoles. The console driver starts an | ||
+ | interrupt routine which scans the keyboard and checks for CBM key plus 1-4 and | ||
+ | then sends a pulse message to its channel telling it to switch consoles. | ||
+ | |||
+ | By now you might be thinking " | ||
+ | process switching", | ||
+ | 20mhz. There isn't as much switching as you would expect either, considering | ||
+ | that IO_READ and IO_WRITE messages deal with buffers as large as 64k, so it's | ||
+ | not as if ever single character requires a switch. | ||
+ | |||
+ | -------------------------------------------------- | ||
+ | | ||
+ | -------------------------------------------------- | ||
+ | |||
+ | One of the major things that people who are learning UNIX have to learn, is that | ||
+ | practically everything is a file. Devices such as the keyboard and screen (the | ||
+ | console) are accessed using a file. Why you may ask! Well there isn't one | ||
+ | compelling reason, but it just makes it handy if you can access the console as a | ||
+ | file, especially for debugging. Take for example, the ability to redirect screen | ||
+ | output to files, a program doesn' | ||
+ | if everything is a file, including the console, it's just a simple matter of | ||
+ | changing the output file. | ||
+ | |||
+ | Not only are devices files, but filesystems can be " | ||
+ | which gets rid of the need for devices numbers. Navigating through different | ||
+ | filesystems is just a simple matter of changing directories. It also means that | ||
+ | applications don't concern themselves with what the actual filesystem and device | ||
+ | is, just that it's there. So applications will work with any devices that have | ||
+ | drivers. | ||
+ | |||
+ | Ok so now you know some of the reasons behind the " | ||
+ | is it done in JOS? I mentioned that the process manager is in charge of " | ||
+ | up" channels, but how does it perform this lookup? | ||
+ | |||
+ | The process manager contains a table with entries for file-systems, | ||
+ | special processes. File-systems are names that end in a '/', | ||
+ | usually start with '/ | ||
+ | may look something like this: | ||
+ | |||
+ | Name Channel Unit | ||
+ | / 2 1 ; file-system mounted at / | ||
+ | / | ||
+ | *digi 3 0 ; | ||
+ | *tcpip 4 0 ; | ||
+ | / | ||
+ | *cbmfsys 5 0 ; | ||
+ | *packet 6 0 ; | ||
+ | / | ||
+ | ; this | ||
+ | |||
+ | The name and channel fields are self explainitory but the Unit field allows a | ||
+ | channel to determine which of its names was used. | ||
+ | |||
+ | Whenever the process manager receives a request to look something up, depending | ||
+ | on what type of request it is (special process requests don' | ||
+ | the processes Current Working Directory to the filename (unless the name starts | ||
+ | with a '/' | ||
+ | the string. | ||
+ | |||
+ | So for example you ask for the file " | ||
+ | "/ | ||
+ | |||
+ | "/ | ||
+ | |||
+ | This string is then compared to the table, and finds the longest full match, in | ||
+ | this case it would find "/ | ||
+ | " | ||
+ | |||
+ | The great thing about this whole " | ||
+ | don't necessarily need to know what they' | ||
+ | can be loaded and unloaded at will for the ultimate in scalability and | ||
+ | modularity. | ||
+ | |||
+ | You might think that setting up the request and dealing with the responses, | ||
+ | every time you want to open a file is a bit tiresome, but it's all handled for | ||
+ | you with the " | ||
+ | |||
+ | pea O_READ | ||
+ | pea ^devcon1 | ||
+ | pea !devcon1 | ||
+ | jsr @_open ; returns file number in x or -1 on failure | ||
+ | pla | ||
+ | pla | ||
+ | pla | ||
+ | |||
+ | ... | ||
+ | |||
+ | devcon1 .asc "/ | ||
+ | |||
+ | That's all for now. In the next article, i'll be writing about process | ||
+ | loading + shared libraries, networking, terminal IO (console + modems) + some | ||
+ | other things... | ||
+ | |||
+ | Hopefully you will have learned something from this article, and can see the | ||
+ | power that a real multitasking OS, such as JOS, can bring to the SuperCPU. | ||
+ | |||
+ | Any feedback goes to jmaginni@postoffice.utas.edu.au , i'm particularly on the | ||
+ | lookout for people who can help with hardware; docs, code etc... | ||
+ | Also, check the JOS homepage at http:// | ||
+ | mailing list if you're interested in updates. | ||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H 19 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | VIC KERNAL Disassembly Project - Part III | ||
+ | Richard Cini | ||
+ | September 1, 1999 | ||
+ | |||
+ | Introduction | ||
+ | ============ | ||
+ | |||
+ | In the last installment of this series, we examined the two remaining | ||
+ | hard-coded processor interrupt vectors, the IRQ and NMI vectors. Although we | ||
+ | took a complete look at the routines, we did not examine some of the | ||
+ | subroutines that IRQ and NMI call. We'll examine these routines first. | ||
+ | Having completed the main processor vectors, we'll continue this | ||
+ | series by examining other Kernal routines. | ||
+ | |||
+ | Remaining Subroutines | ||
+ | ===================== | ||
+ | |||
+ | The NMI and IRQ routines together call 11 subroutines, | ||
+ | we previously examined in Part I of this series, and two call the NMI vectors | ||
+ | in the BASIC ROM and A0 Option ROM. So, let's examine the four remaining | ||
+ | subroutines. | ||
+ | |||
+ | UDTIM/ | ||
+ | ------------ | ||
+ | |||
+ | The IRQ vector calls the update time function UDTIM through the | ||
+ | jump table at the end of the Kernal ROM, while the NMI function skips the | ||
+ | intermediate call through the jump table and directly calls the time function. | ||
+ | |||
+ | UDTIM: | ||
+ | FFEA 4C 34 F7 JMP IUDTIM ; | ||
+ | |||
+ | F734 ; | ||
+ | F734 ; IUDTIM - Update Jiffy Clock (internal) | ||
+ | F734 ; | ||
+ | F734 ; | ||
+ | F734 IUDTIM | ||
+ | F734 A2 00 LDX #$00 | ||
+ | F736 E6 A2 INC CTIMR2 ; | ||
+ | F738 D0 06 BNE UDTIM1 ;not 0, move on (no roll) | ||
+ | F73A E6 A1 INC CTIMR1 ; | ||
+ | F73C D0 02 BNE UDTIM1 ;not 0, move on (no roll) | ||
+ | F73E E6 A0 INC CTIMR0 ; | ||
+ | F740 | ||
+ | F740 UDTIM1 ; | ||
+ | F740 ; check for 24hr roll | ||
+ | F740 ; A0-A2 hold max of 4F1A00 | ||
+ | F740 38 SEC ;set carry | ||
+ | F741 A5 A2 LDA CTIMR2 ; get LSB | ||
+ | F743 E9 01 SBC #$01 ; minus 1 | ||
+ | F745 A5 A1 LDA CTIMR1 ; | ||
+ | F747 E9 1A SBC #$1A ; minus 1Ah | ||
+ | F749 A5 A0 LDA CTIMR0 ; | ||
+ | F74B E9 4F SBC #$4F ; minus 4Fh | ||
+ | F74D 90 06 BCC UDTIM2 ; ok | ||
+ | F74F | ||
+ | F74F 86 A0 STX CTIMR0 ; | ||
+ | F751 86 A1 STX CTIMR1 ; registers to zero | ||
+ | F753 86 A2 STX CTIMR2 | ||
+ | F755 | ||
+ | F755 | ||
+ | F755 AD 2F 91 LDA D2ORAH ; | ||
+ | F758 CD 2F 91 CMP D2ORAH | ||
+ | F75B D0 F8 BNE UDTIM2 ;not same, check again | ||
+ | F75D | ||
+ | F75D 85 91 STA STKEY ; | ||
+ | F75F 60 RTS | ||
+ | |||
+ | UDTIM is called every 1/60th of a second by the IRQ routine, and | ||
+ | begins execution by incrementing each of the time-keeping registers in | ||
+ | the Zero Page locations $A0 to $A2. As each is incremented, | ||
+ | for roll-over (i.e., for the count exceeding the maximum allowed for the | ||
+ | register). Taken together, the three consecutive memory locations make-up | ||
+ | the "jiffy clock" (as the VIC's RTC is sometimes referred; a " | ||
+ | 1/60 of one second). | ||
+ | |||
+ | At the label UDTIM1, the code checks for a 24hr roll-over. The three | ||
+ | byte-sized registers (no pun intended) can store the 24-hour jiffy count | ||
+ | of 5,184,000 decimal, or 4F1A00 hex. If the count exceeds this value, the | ||
+ | registers are reset to zero. | ||
+ | |||
+ | The BASIC TI function accesses the jiffy clock, representing the | ||
+ | count as a decimal number. Similarly, the TI$ function represents the jiffy | ||
+ | clock as a 24-hour HH:MM:SS clock instead of a jiffy count. | ||
+ | |||
+ | UDTIM is also responsible for processing the STOP key on behalf of | ||
+ | the IRQ and NMI routines, so if a user program handles either of these | ||
+ | interrupts, the programmer must remember to call UDTIM in order to maintain | ||
+ | the time clock and STOP key functionality. | ||
+ | |||
+ | CCOLRAM | ||
+ | ------- | ||
+ | |||
+ | This short routine is responsible for determining the location of | ||
+ | the color ram. In the VIC, the screen and color memory locations change based | ||
+ | on the amount of RAM installed, as follows: | ||
+ | |||
+ | Function Unexpanded Expanded | ||
+ | -------- ---------- -------- | ||
+ | User BASIC $1000 00010000 $1200 00010010 | ||
+ | Screen Memory $1E00 00011110 $1000 00010000 | ||
+ | Color RAM $9600 10010110 $9400 10010100 | ||
+ | |||
+ | The two least significant bits of the most-significant byte of each | ||
+ | of the screen memory and color RAM pointer registers defines the resulting | ||
+ | location. If the bit pattern of the screen memory is " | ||
+ | the color RAM base to page $96. If the bit pattern is " | ||
+ | the color RAM base to page $94. | ||
+ | |||
+ | The two other possible bit patterns result from screen memory | ||
+ | beginning at $1100 or $1F00, and produce color RAM locations of $9500 | ||
+ | and $9700, respectively. The $1100 starting location will actually work, | ||
+ | but result in 256 bytes of wasted user RAM. The $1F00 starting location | ||
+ | will not work since the color RAM locations overlap the I/O Block 2 | ||
+ | addresses, which have no RAM associated with them. | ||
+ | |||
+ | EAB2 ; | ||
+ | EAB2 ; CCOLRAM - Calculate pointer to color RAM | ||
+ | EAB2 ; | ||
+ | EAB2 CCOLRAM | ||
+ | EAB2 A5 D1 LDA LINPTR ;get ptr to screen RAM LSB | ||
+ | EAB4 85 F3 STA COLRPT ; | ||
+ | EAB6 A5 D2 LDA LINPTR+1 ; | ||
+ | EAB8 29 03 AND # | ||
+ | EABA 09 94 ORA # | ||
+ | EABA ; RAM pointer | ||
+ | EABC 85 F4 STA COLRPT+1 ; | ||
+ | EABE 60 RTS ; | ||
+ | |||
+ | |||
+ | ISCNKY | ||
+ | ====== | ||
+ | |||
+ | This is the low-level keyboard scan function which is called | ||
+ | 60 times per second by the IRQ routine. ISCNKY scans the keyboard matrix | ||
+ | to retrieve a keypress, maps the key number to its ASCII equivalent, and | ||
+ | places the ASCII value at the end of the keyboard buffer. If IRQs are | ||
+ | disabled, the keyboard scanning is suspended. ISCNKY is accessible to user | ||
+ | programs through the Kernal jump table, although calling it with interrupts | ||
+ | enabled is not recommended. | ||
+ | |||
+ | To retrieve a character from the keyboard, a user program would typically | ||
+ | call GETIN ($FFE4), the buffered keyboard input routine. GETIN returns | ||
+ | the ASCII value of the character at the head of the keyboard buffer, or | ||
+ | zero if no character is available. | ||
+ | |||
+ | VIA2 is directly connected to the keyboard. Port B is used as the column | ||
+ | strobe and Port A is used as the row input. To read the keyboard matrix, | ||
+ | the code brings all column strobe lines to 0 and reads the row inputs, in | ||
+ | order, until a key is found (or not found). The code also begins decoding | ||
+ | the ASCII using the " | ||
+ | are for shifted, C= (Commodore) keys, and shift+C= keys. | ||
+ | |||
+ | EB1E ; | ||
+ | EB1E ; ISCNKY - Scan keyboard | ||
+ | EB1E ; | ||
+ | EB1E ; | ||
+ | EB1E | ||
+ | EB1E A9 00 LDA #$00 ; set shft/ctrl flag to 0 | ||
+ | EB20 8D 8D 02 STA SHFTFL | ||
+ | |||
+ | EB23 A0 40 LDY #$40 ; assume no keys pressed | ||
+ | EB25 84 CB STY KEYDN ; | ||
+ | |||
+ | EB27 8D 20 91 STA D2ORB ; bring all column bits low | ||
+ | EB2A AE 21 91 LDX D2ORA ; read row inputs | ||
+ | EB2D E0 FF CPX #$FF ; any character keys pressed? | ||
+ | EB2F F0 5E BEQ PROCK1A ; no, exit | ||
+ | |||
+ | EB31 A9 FE LDA # | ||
+ | EB33 8D 20 91 STA D2ORB ; output bit pattern | ||
+ | |||
+ | EB36 A0 00 LDY #$00 ; zero character count reg | ||
+ | |||
+ | ; set default translation | ||
+ | ; table to Table 1 | ||
+ | EB38 A9 EA LDA #$EA ; | ||
+ | EB3A 85 F5 STA KEYTAB | ||
+ | EB3C A9 EA LDA #$EA ; | ||
+ | EB3E 85 F6 STA KEYTAB+1 | ||
+ | EB40 | ||
+ | EB40 | ||
+ | EB40 A2 08 LDX #$08 ; 8 rows to test in column | ||
+ | EB42 AD 21 91 LDA D2ORA ; get column | ||
+ | EB45 CD 21 91 CMP D2ORA ; test again - debounce | ||
+ | EB48 D0 F6 BNE ISCKLP1 ; not equal, retry | ||
+ | EB4A | ||
+ | EB4A | ||
+ | EB4A 4A LSR A ; shift through carry flag | ||
+ | EB4B B0 16 BCS ISCNK1+3 ; CY=1 for key not pressed | ||
+ | EB4D | ||
+ | EB4D 48 PHA ; save column bit pattern | ||
+ | EB4E B1 F5 LDA (KEYTAB), | ||
+ | EB4E ; | ||
+ | EB50 C9 05 CMP #$05 ; ASCII > 5, move on | ||
+ | EB52 B0 0C BCS ISCNK1 ; | ||
+ | EB54 | ||
+ | EB54 C9 03 CMP #$03 ; ASCII=3 STOP key | ||
+ | EB56 F0 08 BEQ ISCNK1 ; got STOP so skip flag updt | ||
+ | EB58 | ||
+ | EB58 0D 8D 02 ORA SHFTFL ; save SHFT, CTRL, C= flag | ||
+ | EB5B 8D 8D 02 STA SHFTFL | ||
+ | EB5E 10 02 BPL ISCNK1+2 ; move on to next row in col | ||
+ | EB60 | ||
+ | EB60 | ||
+ | EB60 84 CB STY KEYDN ; save key# | ||
+ | EB62 68 PLA ; restore col bit pattern | ||
+ | EB63 C8 INY ; increment key count | ||
+ | EB64 C0 41 CPY #$41 ; 64 keys scanned? | ||
+ | EB66 B0 09 BCS ISCNEXIT ; yes, return ASCII value | ||
+ | EB68 | ||
+ | EB68 CA DEX ; go on to next row in col | ||
+ | EB69 D0 DF BNE ISCKLP2 ; | ||
+ | EB6B | ||
+ | EB6B 38 SEC ; done with first column, so | ||
+ | EB6C 2E 20 91 ROL D2ORB ; | ||
+ | EB6F D0 CF BNE ISCKLP1 ; | ||
+ | EB71 | ||
+ | EB71 | ||
+ | EB71 6C 8F 02 JMP (FCEVAL) ; CINT1A points this to SHEVAL | ||
+ | EB71 ; the shift evaluation code | ||
+ | EB74 ; | ||
+ | EB74 ; Process key image | ||
+ | EB74 ; | ||
+ | EB74 | ||
+ | EB74 A4 CB LDY KEYDN ; get key number (as index) | ||
+ | EB76 B1 F5 LDA (KEYTAB), | ||
+ | EB78 AA TAX ; copy ASCII code to .X | ||
+ | EB79 C4 C5 CPY CURKEY ; is it the same as the | ||
+ | ; | ||
+ | EB7B F0 07 BEQ PROCK1 ; yes, do repeat eval | ||
+ | EB7D | ||
+ | EB7D A0 10 LDY #$10 ; set repeat delay | ||
+ | EB7F 8C 8C 02 STY KRPTDL | ||
+ | EB82 D0 36 BNE PROCK4 ; not same key, so exit | ||
+ | EB84 | ||
+ | EB84 | ||
+ | EB84 29 7F AND # | ||
+ | EB86 2C 8A 02 BIT KEYRPT ; do test | ||
+ | EB89 30 16 BMI PROCK2 ; | ||
+ | EB8B 70 49 BVS PROCK5 ; | ||
+ | EB8D | ||
+ | EB8D C9 7F CMP #$7F ; last non-revs' | ||
+ | EB8F | ||
+ | EB8F | ||
+ | EB8F F0 29 BEQ PROCK4 | ||
+ | EB91 | ||
+ | EB91 C9 14 CMP #$14 ; {DEL}? | ||
+ | EB93 F0 0C BEQ PROCK2 ; | ||
+ | EB95 | ||
+ | EB95 C9 20 CMP #$20 ; {SPACE}? | ||
+ | EB97 F0 08 BEQ PROCK2 ; | ||
+ | EB99 | ||
+ | EB99 C9 1D CMP #$1D ; {<-}? | ||
+ | EB9B F0 04 BEQ PROCK2 ; | ||
+ | EB9D | ||
+ | EB9D C9 11 CMP #$11 ; {CRS DN}? | ||
+ | EB9F D0 35 BNE PROCK5 ; | ||
+ | EBA1 | ||
+ | EBA1 | ||
+ | EBA1 AC 8C 02 LDY KRPTDL ; get repeat delay | ||
+ | EBA4 F0 05 BEQ PROCK3 ; | ||
+ | EBA6 | ||
+ | EBA6 CE 8C 02 DEC KRPTDL ; not done delaying, so exit | ||
+ | EBA9 D0 2B BNE PROCK5 ; | ||
+ | EBAB | ||
+ | EBAB | ||
+ | EBAB CE 8B 02 DEC KRPTSP ; decrement repeat speed cnt | ||
+ | EBAE D0 26 BNE PROCK5 ; not done delaying, so exit | ||
+ | EBB0 | ||
+ | EBB0 A0 04 LDY #$04 ; delay speed cnt reached 0, | ||
+ | ; | ||
+ | EBB2 8C 8B 02 STY KRPTSP ; save it | ||
+ | EBB5 A4 C6 LDY KEYCNT ; get count of keys in kbd | ||
+ | ; | ||
+ | EBB7 88 DEY ; at least one, so exit | ||
+ | EBB8 10 1C BPL PROCK5 ; | ||
+ | EBBA | ||
+ | EBBA | ||
+ | EBBA A4 CB LDY KEYDN ; get current key number | ||
+ | EBBC 84 C5 STY CURKEY ; re-save as current | ||
+ | EBBE AC 8D 02 LDY SHFTFL ; get current shift pattern | ||
+ | EBC1 8C 8E 02 STY LSSHFT ; save as last shft pattern | ||
+ | EBC4 E0 FF CPX #$FF ; re-check for any keys down | ||
+ | EBC6 F0 0E BEQ PROCK5 ; none, so exit | ||
+ | EBC8 | ||
+ | EBC8 8A TXA ; restore ASCII code to .A | ||
+ | EBC9 A6 C6 LDX KEYCNT ; get count of keys in buffer | ||
+ | EBCB EC 89 02 CPX KBMAXL ; more than maximum allowed? | ||
+ | EBCE B0 06 BCS PROCK5 ; yes, drop current key press | ||
+ | EBD0 | ||
+ | EBD0 9D 77 02 STA KBUFFR,X ; save ASCII code in buffer | ||
+ | EBD3 E8 INX ; increment buffer count and | ||
+ | EBD4 86 C6 STX KEYCNT ; | ||
+ | EBD6 | ||
+ | EBD6 | ||
+ | EBD6 A9 F7 LDA #$F7 ; clear bit for COL3 (STOP key | ||
+ | EBD8 8D 20 91 STA D2ORB ; is in COL3); save it to VIA | ||
+ | EBDB 60 RTS ; exit routine | ||
+ | |||
+ | |||
+ | Part of the keyboard scanning includes evaluating whether or not | ||
+ | key modifier keys are pressed. Modifier keys include the SHIFT, Commodore, | ||
+ | and CTRL keys. The ASCII decoding table is changed based on whether or not | ||
+ | one of these keys is pressed. It also looks like the following code went | ||
+ | through several revisions considering the multiple patch areas (filled with | ||
+ | NOPs). Alternatively, | ||
+ | for different languages. | ||
+ | |||
+ | EBDC ; | ||
+ | EBDC ; Evaluate for shift/ | ||
+ | EBDC ; | ||
+ | EBDC | ||
+ | EBDC AD 8D 02 LDA SHFTFL ; 1=SHFT; 2=C> 4=CTRL | ||
+ | EBDF C9 03 CMP #$03 ; C> + shft? | ||
+ | EBE1 D0 2C BNE PROCK6A ; no, select proper decode | ||
+ | EBE3 | ||
+ | EBE3 CD 8E 02 CMP LSSHFT ; is the pattern the same as | ||
+ | EBE6 F0 EE BEQ PROCK5 ; last one? Yes, exit. | ||
+ | EBE8 | ||
+ | EBE8 AD 91 02 LDA SHMODE ; different pattern | ||
+ | EBEB 30 56 BMI PROCKEX ; | ||
+ | EBED | ||
+ | EBED EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EBF3 EAEA | ||
+ | EBF5 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EBFB EAEA | ||
+ | EBFD EA EA EA .db $ea, $ea, $ea | ||
+ | EC00 | ||
+ | EC00 AD 05 90 LDA VRSTRT ; get char ROM address | ||
+ | EC03 49 02 EOR # | ||
+ | EC05 8D 05 90 STA VRSTRT ; | ||
+ | EC08 | ||
+ | EC08 EA EA EA EA .db $ea, $ea, $ea, $ea | ||
+ | EC0C | ||
+ | EC0C | ||
+ | EC0C 4C 43 EC JMP PROCKEX ; | ||
+ | EC0F | ||
+ | EC0F | ||
+ | EC0F 0A ASL A ; multiply index by 2 | ||
+ | EC10 C9 08 CMP #$08 ; >= 8 (5 entries)? | ||
+ | EC12 90 04 BCC $+6 ; no, continue | ||
+ | EC14 | ||
+ | EC14 A9 06 LDA #$06 ; yes, assume CTRL table | ||
+ | EC16 | ||
+ | EC16 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EC1C EAEA | ||
+ | EC1E EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EC24 EAEA | ||
+ | EC26 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EC2C EAEA | ||
+ | EC2E EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea | ||
+ | EC34 EAEA | ||
+ | EC36 EA EA .db $ea, $ea | ||
+ | EC38 | ||
+ | EC38 AA TAX ; reset pointer to point | ||
+ | EC39 BD 46 EC LDA KDECOD, | ||
+ | EC3C 85 F5 STA KEYTAB ; | ||
+ | EC3E BD 47 EC LDA KDECOD+1,X | ||
+ | EC41 85 F6 STA KEYTAB+1 | ||
+ | EC43 | ||
+ | EC43 | ||
+ | EC43 4C 74 EB JMP PROCKY ; continue processing image | ||
+ | EC46 | ||
+ | |||
+ | EC46 ; | ||
+ | EC46 ; KDECOD - Pointers to keyboard decode tables | ||
+ | EC46 ; | ||
+ | EC46 | ||
+ | EC46 5E EC .dw KDECD1 ; | ||
+ | EC48 9F EC .dw KDECD2 ; | ||
+ | EC4A E0 EC .dw KDECD3 ; | ||
+ | EC4C A3 ED .dw KDECD5 ; | ||
+ | EC4E 5E EC .dw KDECD1 ; | ||
+ | EC50 9F EC .dw KDECD2 ; | ||
+ | EC52 69 ED .dw KDECD4 ; | ||
+ | EC54 A3 ED .dw KDECD5 ; | ||
+ | EC56 21 ED .dw GRTXTF ; | ||
+ | EC58 69 ED .dw KDECD4 ; | ||
+ | EC5A 69 ED .dw KDECD4 ; | ||
+ | EC5C A3 ED .dw KDECD5 ; | ||
+ | |||
+ | |||
+ | Now, let's look at a few very simple routines just so that we can | ||
+ | check them off of the list: | ||
+ | |||
+ | IIOBASE | ||
+ | ======= | ||
+ | |||
+ | IIOBASE is the internal label behind the Kernal IOBASE function. | ||
+ | Calling IOBASE results in code execution being transferred to IIOBASE: | ||
+ | |||
+ | IOBASE: | ||
+ | FFF3 4C 00 E5 JMP IIOBASE ; | ||
+ | |||
+ | IOBASE returns the address of the beginning of the I/O region of | ||
+ | the VIC memory map in the .X and .Y registers. Locations $9110 to $912F are | ||
+ | the addresses reserved for the VIC's two 6522 VIAs. This is the first routine | ||
+ | in the Kernal ROM. | ||
+ | |||
+ | The value of this function in the VIC is questionable since there | ||
+ | is no way to change the address at which the VIAs appear, and interestingly, | ||
+ | the Kernal code does not call IOBASE at all. The Kernal instead relies on | ||
+ | hard-coded addresses. | ||
+ | |||
+ | However, one could conclude that the actual location of the VIAs | ||
+ | in the VIC's address space changed during the Kernal development process, | ||
+ | so IOBASE was somehow used to normalize the address. This also enabled code | ||
+ | portability between the VIC and the C64. | ||
+ | |||
+ | The BASIC ROM appears to call IOBASE in the RND function. The | ||
+ | existence of other calls is unknown at this time since the BASIC ROM has | ||
+ | yet to be disassembled. | ||
+ | |||
+ | E500 ; | ||
+ | E500 ; IIOBASE - Return I/O base address | ||
+ | E500 ; | ||
+ | E500 | ||
+ | E500 A2 10 LDX # | ||
+ | E502 A0 91 LDY #$91 | ||
+ | E504 60 RTS | ||
+ | |||
+ | |||
+ | ISCREN | ||
+ | ====== | ||
+ | |||
+ | ISCREN is the internal label behind the Kernal SCREEN function. | ||
+ | Calling SCREEN results in code execution being transferred to ISCREN: | ||
+ | |||
+ | SCREEN: | ||
+ | FFED 4C 05 E5 JMP ISCREN ; | ||
+ | |||
+ | E505 ; | ||
+ | E505 ; ISCREN - Return screen organization | ||
+ | E505 ; Returns the screen organization .X(columns) and .Y(rows) | ||
+ | E505 ; | ||
+ | E505 | ||
+ | E505 A2 16 LDX #$16 ; | ||
+ | E507 A0 17 LDY #$17 | ||
+ | E509 60 RTS | ||
+ | |||
+ | This code returns the row and column organization of the screen in | ||
+ | the .X and .Y registers. It doesn' | ||
+ | function to determine the screen size, instead relying on hard-coded | ||
+ | values under the assumption that the screen is 22x23. So, this function' | ||
+ | utility appears to be purely for the benefit of user code. | ||
+ | |||
+ | IPLOT | ||
+ | ===== | ||
+ | |||
+ | IPLOT is the internal label behind the Kernal PLOT function. | ||
+ | Calling PLOT results in code execution being transferred to IPLOT: | ||
+ | |||
+ | PLOT: | ||
+ | FFF0 4C 0A E5 JMP IPLOT ; | ||
+ | |||
+ | E50A ; | ||
+ | E50A ; IPLOT - Read/set cursor position | ||
+ | E50A ; On entry: | ||
+ | E50A ; | ||
+ | E50A ; | ||
+ | E50A | ||
+ | E50A B0 07 BCS READPL ; | ||
+ | E50C 86 D6 STX CURROW ; | ||
+ | E50E 84 D3 STY CSRIDX ; | ||
+ | E510 20 87 E5 JSR SCNPTR ; | ||
+ | E513 | ||
+ | E513 | ||
+ | E513 A6 D6 LDX CURROW ; | ||
+ | E515 A4 D3 LDY CSRIDX ; | ||
+ | E517 60 RTS | ||
+ | |||
+ | The Kernal again does not call this function, instead managing cursor | ||
+ | movement by changing the values of the current row and current cursor index | ||
+ | (i.e., the cursor' | ||
+ | location, the code commits the changes by jumping to an internal routine | ||
+ | in CINT1 which is responsible for moving the cursor block in screen memory. | ||
+ | |||
+ | |||
+ | Conclusion | ||
+ | ========== | ||
+ | |||
+ | In this installment, | ||
+ | are integral to the operation of the VIC. The Jiffy clock routine also | ||
+ | scans the STOP key, which is important to overall usability and the ability | ||
+ | to halt a program. The second routine, SCNKEY, is responsible for scanning | ||
+ | the keyboard matrix. That's pretty important, too. | ||
+ | Next time, we'll examine more routines in the VIC's KERNAL, including | ||
+ | I/O routines. | ||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H 19 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | JPEG: Decoding and Rendering on a C64 | ||
+ | ------------------------------------- Stephen Judd | ||
+ | < | ||
+ | Adrian Gonzalez | ||
+ | < | ||
+ | |||
+ | In the C64 world there are a disturbing number of cases where | ||
+ | people have said, "It can't be done on a C64." | ||
+ | until someone actually takes a look at the task and its requirements, | ||
+ | and says "Not only can it be done, but it can be done easily." | ||
+ | one such case. | ||
+ | This article is divided into two parts. | ||
+ | JPEGs and the decoding process. | ||
+ | issues not covered well, if at all, in existing documentation, | ||
+ | the IDCT; the article also covers the principles of decoding JPEGs and | ||
+ | JFIF files. | ||
+ | In part 2, Adrian discusses Floyd-Steinberg dithering, and how it | ||
+ | can be applied to various C64 graphics modes (and how it can be used to | ||
+ | display jpegs!). | ||
+ | of course be discussed, and the source code is available at | ||
+ | |||
+ | http:// | ||
+ | |||
+ | for both the decoder and the renderer. | ||
+ | |||
+ | The decoder is about 4k of code, the renderer is around 2k, and | ||
+ | there are about 9k of tables. | ||
+ | ample memory left over. With the color IFLI versions, memory is extremely | ||
+ | tight -- there are 32k of graphics, six 24-bit image buffers. | ||
+ | trees are stored in the screen RAM area. The renderer crams all the data | ||
+ | into the graphics area, which is why you see garbage while the image is | ||
+ | rendering. | ||
+ | bytes free in page 1, and a few tens of bytes free in page 2, and that's it! | ||
+ | Everything else just kind-of barely/ | ||
+ | ' | ||
+ | |||
+ | Finally, Errol Smith deserves a special mention as the guy who first | ||
+ | tracked down some decent JPEG documentation. | ||
+ | direction and within a few weeks we had JPEGs on a 64. | ||
+ | |||
+ | ------ | ||
+ | Part I: Decoding jpegs | ||
+ | ------ | ||
+ | |||
+ | Decoding jpegs is a fairly straightforward process, and in | ||
+ | recent years some free documentation has become available. | ||
+ | article is meant to complement that documentation, | ||
+ | some of the gaps and detailing some of the broader issues, not to | ||
+ | mention some specific implementation issues. | ||
+ | article covers general jpeg issues: encoding/ | ||
+ | storage, Fourier transforms, JFIF files, and so on. The second part | ||
+ | covers implementation issues more specific to the C64. | ||
+ | |||
+ | There are several sources of JPEG documentation online and in | ||
+ | the library. | ||
+ | useful: | ||
+ | |||
+ | Cryx' | ||
+ | |||
+ | ftp:// | ||
+ | article from one which appeared in the April 1991 | ||
+ | " | ||
+ | |||
+ | "JPEG Still Image Data Compression Standard" | ||
+ | and Joan L. Mitchell, published by Van Nostrand Reinhold, 1993, | ||
+ | ISBN 0-442-01272-1. | ||
+ | |||
+ | The first, Cryx's writeup, is a programmer' | ||
+ | it has good, detailed descriptions of the encoding/ | ||
+ | the file structure/ | ||
+ | and markers. | ||
+ | the basic principles of JPEGs, the how's and why's of the standard, and has | ||
+ | some helpful examples. | ||
+ | but is written in a way which I feel tends to obscure the important points. | ||
+ | Nevertheless, | ||
+ | several fast DCT algorithms, which is invaluable. | ||
+ | of information, | ||
+ | helpful. | ||
+ | |||
+ | JPEG Encoding/ | ||
+ | ---------------------- | ||
+ | |||
+ | It's really simple, folks. | ||
+ | |||
+ | Start with a grayscale image and divide it up into 8x8 pixel blocks | ||
+ | (just like a C64 bitmap). | ||
+ | image; the second block is to the right of the first block, and so on until | ||
+ | the end of the row is reached, at which point the next row begins. | ||
+ | The next step is to take the two dimensional discrete cosine | ||
+ | transform of each 8x8 component, and filter out the small-amplitude | ||
+ | frequencies. | ||
+ | is that you are left with a lot of zeros in the 64-byte data block, and | ||
+ | a few nonzero elements from which you can reconstruct the main features | ||
+ | of the image. | ||
+ | The next step is to RLE-encode the resulting 8x8 block (since most | ||
+ | of the components are zero), and finally to Huffman-encode the RLE-encoded | ||
+ | data. And that's it. Done. Finished. | ||
+ | |||
+ | Color pictures are similar, but now each pixel has an 8-bit R, G, | ||
+ | and B value, so there will be three 8x8 blocks, for a total of 24 bits | ||
+ | (not quite like a C64 bitmap...). | ||
+ | luminance/ | ||
+ | for each 8x8 section of a color image there are three 64-byte blocks of | ||
+ | data, and each block is encoded as above. | ||
+ | So to summarize: transform the data, filter (" | ||
+ | transformed data, and RLE-encode and Huffman-encode the result. | ||
+ | for each component, and then move on to the next 8x8 block. | ||
+ | to decode the image data: | ||
+ | |||
+ | read in the bits, | ||
+ | find the Huffman code, | ||
+ | unpack the RLE, | ||
+ | de-quantize the data, | ||
+ | and perform the inverse transform, | ||
+ | |||
+ | for each 8x8 block of image data to be plotted to the screen. | ||
+ | until done. | ||
+ | It turns out that there are other methods of JPEG compression | ||
+ | in the standard, such as arithmetic compression, | ||
+ | supported due to legal reasons (lame software patent owned by IBM, AT&T, | ||
+ | and Mitsubishi), | ||
+ | gains. | ||
+ | " | ||
+ | a progressive jpeg the image is stored in a series of " | ||
+ | from lower to higher resolution. | ||
+ | (which are more common). | ||
+ | Finally, it turns out that an 8x8 block of image data doesn' | ||
+ | have to correspond to an 8x8 block of pixels. | ||
+ | of data might represent an average of a 2x2 block of pixels, so an 8x8 | ||
+ | block of data might expand to a 16x16 block of pixels. | ||
+ | the " | ||
+ | You can see that this can offer substantial compression gains, but will | ||
+ | coarsen the data; on the other hand, if the data is already coarse, it's | ||
+ | a way of getting a whole lot for nothing. | ||
+ | pixel mapping for the luminance, and one-to-four (each data byte = 2x2 pixel | ||
+ | block) mapping for the two chrominance components. | ||
+ | standpoint, this means that a decoder typically decodes 16 scanlines at a | ||
+ | time (16x16 pixel chunks). | ||
+ | |||
+ | Before a JPEG can be decoded, though, the decoder needs a fair | ||
+ | amount of information, | ||
+ | tables used, information about the image such as its size, whether it's | ||
+ | a color or a grayscale image, and so on. In a JPEG file, all information | ||
+ | is stored in " | ||
+ | |||
+ | Segments | ||
+ | -------- | ||
+ | |||
+ | A JPEG segment looks like the following: | ||
+ | |||
+ | [header] Two bytes, starting with $FF | ||
+ | [length] Two bytes, in hi/lo order (not usual 6502 lo/hi) | ||
+ | [data] Segment data | ||
+ | |||
+ | A list of JPEG (and JFIF) headers can be found in Cryx's document. | ||
+ | |||
+ | Let's have a look at a hex dump of a jpeg file (from unix, use | ||
+ | "od -tx1 file.jpg | more" | ||
+ | |||
+ | 0000000 | ||
+ | 0000020 | ||
+ | 0000040 | ||
+ | |||
+ | The first two bytes are $ff $d8 -- these two bytes identify the | ||
+ | file as a jpeg. All jpegs start with ff d8. | ||
+ | Next we encounter the header ff e0. ff e0 is a special header | ||
+ | which identifies this file as a JFIF file. It turns out that in the | ||
+ | original JPEG standard a specific file format is not given; this | ||
+ | in turn led to different companies using their own formats, to try and | ||
+ | establish the " | ||
+ | this problem, and is the de-facto standard -- but more on this later. | ||
+ | In a JFIF file, the JFIF segment always follows the JPEG ID byte. | ||
+ | You can see that it is length 16, and that that length includes the two | ||
+ | length bytes. | ||
+ | J F I F and the number 0; following that are some bytes for revision numbers, | ||
+ | the x/y densities, and some thumbnail info. | ||
+ | The next segment starts with the header ff fe. This is the | ||
+ | " | ||
+ | are the ascii codes for " | ||
+ | processing program. | ||
+ | Quantization Table" header. | ||
+ | data -- a stream of Huffman-encoded bits -- is reached. | ||
+ | |||
+ | Huffman Decoding | ||
+ | ---------------- | ||
+ | |||
+ | If you don't know anything about Huffman decoding, then I suggest | ||
+ | you read Pasi's nice article in C=Hacking #16, which has a nice example. | ||
+ | Briefly, a Huffman tree is a binary tree whose left and right branches | ||
+ | correspond to bits 0 and 1 respectively; | ||
+ | tree, you read bits and move left or right accordingly until a leaf | ||
+ | is reached, containing the Huffman code value. | ||
+ | at the top of the tree and decode the next Huffman code. | ||
+ | |||
+ | In a JPEG, Huffman trees are stored in " | ||
+ | segments (header = ff c4): | ||
+ | |||
+ | 0000300 | ||
+ | 0000320 | ||
+ | 0000340 | ||
+ | |||
+ | The first byte in the DHT segment (00) is an ID byte -- JPEGs can have up to | ||
+ | eight Huffman trees. | ||
+ | represents the number of Huffman codes of lengths 1, 2, 3, ..., up to | ||
+ | length 16, followed by the Huffman code values. In the above example, there | ||
+ | are 0 codes of length 1, 1 code of length 2, 5 codes of length 3, and so | ||
+ | on. Following these 16 bytes are the Huffman values: 3, 1, 2, 4, ..., 8. | ||
+ | But what are the Huffman codes corresponding to those values? | ||
+ | It turns out that these trees are so-called " | ||
+ | and work as follows: to get the next code, add 1 to the current code. | ||
+ | When the length increases, add 1 and shift everything left. The exception | ||
+ | is that you don't increment until the first code is defined, so the first | ||
+ | code is always zeroes. | ||
+ | For example, to decode the above DHT segment, start with Huffman | ||
+ | code = 0. There are no codes of length 1, so we shift it left to get | ||
+ | code = 00 (and don't add 1 because the first code hasn't been defined yet). | ||
+ | There is one code of length 2, so we read the first Huffman value and | ||
+ | assign it to the current code | ||
+ | |||
+ | Code Value | ||
+ | 00 3 | ||
+ | |||
+ | That's the only code of length two, so now we move to length 3 by incrementing | ||
+ | and shifting: code = 010. There are five values of length 3, and the next | ||
+ | five Huffman values are 1, 2, 4, 5, 6, so the Huffman tree is now | ||
+ | |||
+ | Code Value | ||
+ | 00 3 | ||
+ | 010 1 | ||
+ | 011 2 | ||
+ | 100 4 | ||
+ | 101 5 | ||
+ | 110 6 | ||
+ | |||
+ | and the rest of the Huffman tree is given by | ||
+ | |||
+ | 1110 | ||
+ | 11110 | ||
+ | 111110 | ||
+ | |||
+ | What's the best way to implement a Huffman tree? | ||
+ | |||
+ | The most obvious way is to use five bytes per " | ||
+ | |||
+ | left pointer (2 bytes) | ||
+ | right pointer (2 bytes) | ||
+ | value (1 byte) | ||
+ | |||
+ | where the left and right pointers are just offsets to be added to the | ||
+ | current pointer, and if left = right = $FFxx then this is a leaf. If you | ||
+ | fetch a bit that says "go left", and the left pointer = $FFxx (but right | ||
+ | pointer is valid) then you've hit an invalid Huffman code -- i.e. decoding | ||
+ | error. | ||
+ | But there is another rather cool method, first described to me by | ||
+ | Errol Smith, which uses only two bytes per node. Now, the five-byte method | ||
+ | works fine in jpx, but in the full-color IFLI jpz code -- well, suddenly | ||
+ | memory becomes extremely tight, and without this routine jpz probably | ||
+ | wouldn' | ||
+ | efficient, especially if implemented using 16-bit 65816 code. | ||
+ | The trick is simply to organize the tree such that if the current | ||
+ | node is at location NODE, then the left node is at NODE+2 and the right | ||
+ | node is at NODE+(NODE). | ||
+ | high bit. So the decoding process is: | ||
+ | |||
+ | get next bit | ||
+ | if 0 then pointer = pointer + 2 | ||
+ | if 1 then pointer = pointer + node value | ||
+ | if high byte of node value < $80 then loop | ||
+ | |||
+ | For example, the first part of the earlier Huffman tree | ||
+ | |||
+ | 00 3 | ||
+ | 010 1 | ||
+ | 011 2 | ||
+ | 100 4 | ||
+ | |||
+ | would be encoded as | ||
+ | |||
+ | 0d 00 04 00 03 80 04 00 01 80 02 80 00 00 00 00 04 80 | ||
+ | -----|-----|-----|-----|-----|-----|-----|-----|-----| | ||
+ | |||
+ | Try decoding the Huffman values, using the above algorithm. | ||
+ | |||
+ | Astute readers may ask the question: won't you decode incorrectly if | ||
+ | there is no left node? Even more astute readers can answer it: in a | ||
+ | canonical Huffman tree, the only nodes without left-node pointers are | ||
+ | leafs. | ||
+ | To see this, consider a counterexample: | ||
+ | |||
+ | o | ||
+ | / | ||
+ | o | ||
+ | \ | ||
+ | o | ||
+ | |||
+ | This corresponds to Huffman code 01 -- one move left, one move right. | ||
+ | In a canonical Huffman tree, the only way to generate the code 01 is to | ||
+ | increment the code 00; since code 00 has already occured, there must be | ||
+ | a left-node. | ||
+ | before creating a right-node. | ||
+ | to checking the right-pointer; | ||
+ | Moreover, since left-nodes are always created first, you can add nodes in | ||
+ | the order they are created -- you never have to insert nodes between | ||
+ | existing nodes. | ||
+ | Pretty nifty, eh? | ||
+ | |||
+ | Restart Markers | ||
+ | --------------- | ||
+ | |||
+ | The image data in a jpeg is a stream of Huffman-encoded bits. | ||
+ | The jpeg standard allows for " | ||
+ | into the stream. | ||
+ | in the data stream, and periodically re-synchronize the bitstream. | ||
+ | far so good -- this is explained in detail in Cryx's document. | ||
+ | What _isn' | ||
+ | re-synchronize the data stream, but when a restart marker is hit the DC | ||
+ | coefficients need to be reset to zero. That is, it really does " | ||
+ | the decoder. | ||
+ | What' | ||
+ | in the 8x8 array, and instead of encoding the actual value a jpeg encodes | ||
+ | the _offset_ from the previous value. | ||
+ | added to the current DC value to get the new value. | ||
+ | to be reset to zero when a restart marker is hit. | ||
+ | Most jpegs do not use restart markers, but unless you reset the | ||
+ | coefficient you're going to spend a few months wondering why Photoshop images | ||
+ | don't decode correctly. | ||
+ | Why is it called the DC coefficient? | ||
+ | on Fourier transforms for the answer. | ||
+ | |||
+ | Note also that when the byte $FF is encountered in the data stream | ||
+ | it must be skipped; the exception is if it is immediately followed by a 00, | ||
+ | in which case $FF00 represents the value $FF. Why do I bring this up? | ||
+ | Because Cryx's document could be interpreted by naive people like myself | ||
+ | as saying this is true throughout a jpeg file, and it's only true within | ||
+ | the image data -- that in other segments, $FF is a perfectly valid byte. | ||
+ | |||
+ | Unpacking the RLE | ||
+ | ----------------- | ||
+ | |||
+ | Once a Huffman code is retrieved and decoded, the resulting byte | ||
+ | represents RLE-compressed data to be uncompressed. | ||
+ | described quite well in Cryx's document, so I'll just refer you to it. | ||
+ | This is repeated until you are left with a 64-byte chunk of data which | ||
+ | needs to be re-ordered and dequantized. | ||
+ | in Cryx's document; briefly, during the encoding process, the original | ||
+ | 8x8 data is re-ordered into a 64-byte vector as follows: | ||
+ | |||
+ | 0 1 5 6 ... | ||
+ | 2 4 7 13 ... | ||
+ | 3 8 12 17 ... | ||
+ | 9 11 18 24 ... | ||
+ | 10 19 23 ... | ||
+ | 20 22 ... | ||
+ | ... | ||
+ | |||
+ | That is, the first element in the vector is the (0,0) component of the | ||
+ | 8x8 array, the next element is the (1,0) component, the next element is | ||
+ | the (0,1) component, and so on. The reason for this " | ||
+ | is to enhance the RLE-compression, | ||
+ | frequencies at the beginning of the vector and the higher frequencies -- | ||
+ | most of which are typically zero-amplitude -- at the end of the vector | ||
+ | (more on this later). | ||
+ | back into an 8x8 array. | ||
+ | each element by a corresponding element in a quantization table: | ||
+ | |||
+ | data[i,j] = data[i, | ||
+ | |||
+ | The final step is to take the resulting 64-byte chunk and apply the | ||
+ | inverse discrete cosine transform (IDCT). | ||
+ | |||
+ | |||
+ | Fourier Transforms and the (I)DCT | ||
+ | --------------------------------- | ||
+ | |||
+ | Let's begin with the definition you'll see in any document on | ||
+ | JPEGS (hear that? That's the sound of one thousand eyes simultaneously | ||
+ | glazing over). | ||
+ | OK, let's back up a moment. | ||
+ | usually straightforward: | ||
+ | and ah, it makes sense. | ||
+ | that took people decades and centuries to figure out. College students | ||
+ | spend multiple months, working hundreds of problems, to gain just a basic | ||
+ | working knowledge of a subject. | ||
+ | Fourier transforms represent a fundamentally different way of thinking, | ||
+ | and the timescale for enlightenment in the subject is years, not minutes. | ||
+ | So don't worry if you don't understand everything immediately; | ||
+ | of this part isn't to make you an instant expert in Fourier transforms, but | ||
+ | rather to give you a toehold into the subject that you can expand on over | ||
+ | time. | ||
+ | |||
+ | So, let's begin with a definition that you'll see in any | ||
+ | document on JPEGS. | ||
+ | of a function f(x) with eight points (x=0..7) may be written as | ||
+ | |||
+ | 7 2*x+1 | ||
+ | F(u) = c(u)/2 * sum f(x) * cos(-----*u*PI), | ||
+ | x=0 16 | ||
+ | |||
+ | where c(0) = 1/sqrt(2) and c=1 otherwise. | ||
+ | you, and it should, because it is rather mysterious-looking. | ||
+ | think of it as some sort of grinder: you insert f(x) into the grinder, | ||
+ | turn the crank, and out pops a new function, F(u). In other words, the | ||
+ | original function f(x) has been _transformed_ into a new function F(u). | ||
+ | Notice that we need to perform a separate sum for each value of u: | ||
+ | |||
+ | F(0) = 1/ | ||
+ | F(1) = 1/2 * sum f(x)*cos((2*x+1)*PI/ | ||
+ | F(2) = 1/2 * sum f(x)*cos((2*x+1)*2*PI/ | ||
+ | |||
+ | and so on. So there are a total of eight summations, each of which | ||
+ | involves eight summands, for a total of 64 operations to perform. | ||
+ | One of the important properties about this transform is that it | ||
+ | is _invertible_. | ||
+ | put it into the other end of the grinder, turn the crank backwards, | ||
+ | and out pops the original function f(x). Moreover it is _uniquely_ | ||
+ | invertible -- for every function f(x), there is one and only one transform | ||
+ | F(u), and vice-versa (the functions f(x) and F(u) are often called | ||
+ | a transform pair). | ||
+ | |||
+ | 7 2*x+1 | ||
+ | f(x) = 1/2 * sum c(u)*F(u) * cos(-----*u*PI), | ||
+ | | ||
+ | |||
+ | You'll notice that it is very similar to the forward transform, except | ||
+ | now the sum is over u, and c(u) is inside of the summation; as before, | ||
+ | there are 64 sums total to perform. | ||
+ | |||
+ | f(x) = 1/2 * ( 1/sqrt(2) F(0) + F(1) * cos((2*x+1)*PI/ | ||
+ | F(2) * cos((2*x+1)*2*PI/ | ||
+ | |||
+ | For now, just note that the original function f(x) is given by a sum | ||
+ | of the transformed function F(u) times different cosine components. | ||
+ | The transform of a two-dimensional function f(x,y) is done by | ||
+ | first taking the transform in one direction (e.g. the x-direction) | ||
+ | followed by the transform in the other direction (e.g. the y-direction). | ||
+ | Thus the two-dimensional 8x8 discrete cosine transform of a function | ||
+ | f(x,y) may be written as | ||
+ | |||
+ | | ||
+ | F(u,v) = --------- * sum sum f(x,y) * cos(-----*u*PI) * cos(-----*v*PI) | ||
+ | | ||
+ | |||
+ | u,v = 0,1,...,7 | ||
+ | |||
+ | where, as before, c(0) = 1/sqrt(2) and c=1 otherwise. | ||
+ | given by | ||
+ | |||
+ | 1 | ||
+ | f(x,y) = --- * sum sum c(u)c(v)*F(u, | ||
+ | 4 u=0 v=0 16 16 | ||
+ | |||
+ | x,y = 0,1...7 | ||
+ | |||
+ | Note that some documentation (e.g. Cryx's document) incorrectly gives c(u) | ||
+ | and c(v) as c(u,v) = 1/2 for u=v=0 and c(u,v) = 1 otherwise. | ||
+ | This is an _extremely_ expensive computation to do, requiring | ||
+ | 64 multiplies of cosines (and computations of the arguments of the | ||
+ | cosines) to calculate the value at a _single_ point (x,y), and there are | ||
+ | 64 points in each 8x8 block, so, even discounting the argument computation | ||
+ | (i.e. u*pi*(2*x+1)/ | ||
+ | _every_ 8x8 block of pixels (where these are 16-bit multiplications). | ||
+ | a C64, in such a case, the decoding time could be measured in hours if not | ||
+ | days. | ||
+ | But if this were the only way to compute a DCT, then JPEGs would | ||
+ | never have been DCT-based. | ||
+ | Fourier transforms, that take advantage of the symmetries of the transform. | ||
+ | You may have heard of the Fast Fourier Transform, which is used in almost | ||
+ | all spectral computing applications; | ||
+ | The one I used is actually an adaptation of the FFT. | ||
+ | So the first task is: where do we find a fast DCT algorithm? | ||
+ | place to look is existing source code, like cjpeg/ | ||
+ | I found it pretty incomprehensible, | ||
+ | it is also pretty large. | ||
+ | that isn't understood (if something goes wrong, then where' | ||
+ | The next place to look is the literature -- many papers have | ||
+ | been written on fast DCT routines. | ||
+ | quite dense, very general (we only need an 8x8 routine, not an NxN routine), | ||
+ | and again, fairly complicated. | ||
+ | What is needed is a _simple_, but fast, IDCT algorithm. | ||
+ | came in the book by Pennebaker and Mitchell, mentioned at the beginning of | ||
+ | the article and available in the library. | ||
+ | routines in it, with detailed discussions of the algorithms, both one- | ||
+ | dimensional and two-dimensional. | ||
+ | the 1D ones are pretty fast and straightforward -- something like 29 adds | ||
+ | and 13 multiplies to compute 8 components of a 1D DCT. Moreover the 13 | ||
+ | " | ||
+ | full multiplications. | ||
+ | using the DCT definition, and you can see that the fast routine is | ||
+ | *hundreds* -- and possibly thousands -- of times faster. | ||
+ | perspective, | ||
+ | and taking 1-2 hours -- maybe even 10 hours or more -- to decode the same | ||
+ | picture! | ||
+ | |||
+ | As mentioned earlier, we can do a 2D IDCT by doing a 1D transform | ||
+ | of the rows of some 2D array followed by a 1D transform the columns (or | ||
+ | vice-versa). | ||
+ | are specialized 2D routines, they are quite large and significantly more | ||
+ | complicated than a 1D routine. | ||
+ | and complicated Bad. And cjpeg/djpeg makes the observation that they don't | ||
+ | seem to give much speed gain in practice. | ||
+ | There' | ||
+ | _forwards_ DCT routines, but devotes just one paragraph to _inverse_ DCT | ||
+ | routines! | ||
+ | hints on reversing flowgraphs. | ||
+ | To make a long story short, IF you reverse the flowgraph correctly, | ||
+ | AND you overcome the errors/ | ||
+ | prepare the coefficients correctly before performing the transformation, | ||
+ | then yes, by golly, it works! | ||
+ | of intense frustration! | ||
+ | the 1D IDCT routine at the end of this article. | ||
+ | |||
+ | At this point, the more experienced programmers are asking, how | ||
+ | do you _know_ it works? | ||
+ | and debug such a routine? | ||
+ | understand a few things about Fourier transforms. | ||
+ | also see why JPEG is based on the DCT, and why it is so effective at | ||
+ | compressing images. | ||
+ | |||
+ | Fourier Transforms for dummies | ||
+ | ------------------ | ||
+ | |||
+ | There are several ways of thinking about a Fourier transform. | ||
+ | One way to think about it is that you can expand any function in a series | ||
+ | of sines and cosines: | ||
+ | |||
+ | f(x) = a0 + a1*cos(wx) + a2*cos(2wx) + a3*cos(3wx) + ... | ||
+ | + b1*sin(wx) + b2*sin(2wx) + b3*sin(3wx) + ... | ||
+ | |||
+ | where the a0 a1 etc. are constant coefficients (amplitudes) and w 2w etc. | ||
+ | are the frequencies. | ||
+ | expanded solely in terms of cosines: | ||
+ | |||
+ | f(x) = a0 + a1*cos(wx) + a2*cos(2wx) + a3*cos(3wx) + ... | ||
+ | |||
+ | " | ||
+ | Once you know them, you can reconstruct the original function by adding up | ||
+ | the cosines. | ||
+ | |||
+ | Now, let's forget about computing the coefficients, | ||
+ | for a moment and look at that expression. | ||
+ | of f(x) is in each cosine component -- for example, the value of a2 says | ||
+ | "how much" of f(x) is in the cos(2wx) component. | ||
+ | coefficient tells us how much of each " | ||
+ | a2 says how much frequency=2w there is, a0 says how much frequency=0 there | ||
+ | is, and so on. | ||
+ | So another way of thinking about a Fourier transform is that it | ||
+ | transforms a function from the space (or time) domain into the _frequency_ | ||
+ | domain -- instead of thinking about how much the function varies with x | ||
+ | (how it varies in space), we can see how it varies with _w_, the frequency; | ||
+ | instead of looking at "how much f" is at a given point in space or time, | ||
+ | we can look at "how much f" is at a given frequency. | ||
+ | So, imagine measuring something simple, like the voltage coming | ||
+ | out of a wall socket. | ||
+ | this is a graph of how the signal varies with time. The Fourier transform | ||
+ | of this signal, however, will have a large spike at 60Hz (or 50Hz if you're | ||
+ | in Europe or .au). Small amplitudes of other frequencies will probably be | ||
+ | seen, too, indicating noise in the signal. | ||
+ | varies with _frequency_ might look something like this: | ||
+ | |||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | --^^--^----^----+-^---^---- | ||
+ | 60Hz | ||
+ | |||
+ | That is, lots of zero or very small amplitude frequencies, | ||
+ | frequency amplitude at around 60/50Hz. | ||
+ | If you've ever seen an equalizer display on a stereo, you've seen | ||
+ | a Fourier transform -- the lights measure how much of the audio signal there | ||
+ | is in a given frequency range. | ||
+ | will have large amplitudes. | ||
+ | lots of distortion), | ||
+ | |||
+ | Now we can take this a few steps further. | ||
+ | a lot of information. | ||
+ | small, and wiggles very rapidly if w is large (and it doesn' | ||
+ | all if w=0). (If you don't see this, just think of x as an angle which | ||
+ | goes around a circle: if x goes around the circle once, then 7x goes | ||
+ | around the circle seven times). | ||
+ | will have a lot of low-frequencies in the transform; a function which changes | ||
+ | rapidly will have large high-frequency components (rapid wiggles give rapid | ||
+ | changes). | ||
+ | The zero frequency is special. | ||
+ | only the zero-frequency component (since cos(0x) is a constant). | ||
+ | the zero-frequency represents the average value of the function over a | ||
+ | period of cosine -- this is easy to see because the average value of cos(x), | ||
+ | cos(2x), etc. is 0 over a full period: it is above zero half of the time, | ||
+ | and below zero the other half of the time, and the two halves cancel. | ||
+ | |||
+ | Now consider an image. | ||
+ | smoothly -- there aren't many sudden sharp changes from black to white. | ||
+ | This means that the transform of some small area of the picture will have | ||
+ | fairly large-amplitude low-frequencies, | ||
+ | frequencies. | ||
+ | away, then the image won't change much at all -- the high frequencies | ||
+ | represent super-fine details of the picture. | ||
+ | " | ||
+ | to throw away the fine details and the unnecessary components, and keep | ||
+ | just the major features of the picture. | ||
+ | for things like line-art, where the image can change rapidly -- you may | ||
+ | have noticed that things like slanted lines tend to get jagged in a jpeg. | ||
+ | |||
+ | The important point to remember is that high frequencies correspond | ||
+ | to rapid changes in the image, low frequencies correspond to smooth changes, | ||
+ | and the zero frequency is the " | ||
+ | electrical engineers on the JPEG comittee, the zero frequency is referred | ||
+ | to as the "DC component" | ||
+ | referred to as the "AC components" | ||
+ | Current). | ||
+ | |||
+ | Finally, for completeness, | ||
+ | a discrete Fourier transform and a continuous Fourier transform, namely | ||
+ | that one gives the transform in terms of discrete frequencies (w, 2w, 3w, | ||
+ | etc.) and the other gives the transform as a continuous function of | ||
+ | frequency. | ||
+ | we necessarily use a discrete transform. | ||
+ | |||
+ | Now, how can you test a Fourier transform routine? | ||
+ | |||
+ | Fourier Transforms for smarties | ||
+ | ------------------ | ||
+ | |||
+ | The basic question is: how do we know if the IDCT is working | ||
+ | correctly? | ||
+ | answer to. | ||
+ | Remember that we are working with transformed data; each element | ||
+ | represents the amplitude of a specific frequency. | ||
+ | vector with a single nonzero element, for example, let a3=10 and all the | ||
+ | other coeffs equal zero. What will the inverse transform of this vector | ||
+ | be? Since a3 is the amplitude of cos(3x), the transform will simply be... | ||
+ | a3*cos(3x)! | ||
+ | will be a1*cos(x). | ||
+ | The above explanation actually isn't _quite_ right, because of | ||
+ | the form of the IDCT used: | ||
+ | |||
+ | 1 | ||
+ | f(x) = --- * sum c(u)*F(u) * cos(-----*u*PI) | ||
+ | 2 u=0 16 | ||
+ | |||
+ | Now it should be easy to see that if, say F(3)=10, and all the other F's | ||
+ | are zero, then the result of the transform -- whatever transform algorithm | ||
+ | is used -- must be | ||
+ | |||
+ | f(x) = 5*cos(3*PI*(2x+1)/ | ||
+ | |||
+ | So, for a one-dimensional IDCT, it is easy to test each component separately | ||
+ | and compare the result with the actual answer. | ||
+ | that has many nonzero components? | ||
+ | There are two important properties of Fourier transforms which | ||
+ | come into play here. The first is that Fourier transforms are _linear_; | ||
+ | a linear operator L satisfies | ||
+ | |||
+ | L(c*f1) = c*L(f1), where c = constant | ||
+ | L(f1 + f2) = L(f1) + L(f2) | ||
+ | |||
+ | That is, constants factor out of the operator, and operating on the sum | ||
+ | of two functions is the same as operating on each function separately | ||
+ | and adding them together. | ||
+ | L1(x) = x and L2(x) = x^2. The first one satisfies the conditions above; | ||
+ | the second one does not. Some other linear transforms you may be familiar | ||
+ | with are rotations, and taking the derivative. | ||
+ | that the Fourier transform satisfies the above conditions; you can also | ||
+ | look at the fast DCT algorithm and see that it only involves additions and | ||
+ | multiplications by constants, which are all linear operations. | ||
+ | This property is enormously important here. It first says that we | ||
+ | can multiply the transformed data by a constant, and the constant will | ||
+ | just multiply the final answer; said another way, if F(3)=10 and all other | ||
+ | F's are nonzero, then we know that F(3)=const*10 will work too, no matter | ||
+ | what the constant is! So in testing one component at a time, you can | ||
+ | pretty confidently say "F(3) works" (as opposed to " | ||
+ | F(3)=11 works" | ||
+ | and other _computer_ issues; the basic algorithm _cannot_. | ||
+ | Even more importantly, | ||
+ | of two functions is equal to the sum of the transforms. | ||
+ | |||
+ | F1 = (0, | ||
+ | |||
+ | works, and we know that the transform of | ||
+ | |||
+ | F2 = (0, | ||
+ | |||
+ | works, then we _know_ that the transform of | ||
+ | |||
+ | F1 + F2 = (0, | ||
+ | |||
+ | works! | ||
+ | we know that the transform of | ||
+ | |||
+ | (0, | ||
+ | |||
+ | works, _no matter what a and b are_. So we can _completely_ test a 1D DCT | ||
+ | simply by testing each component _separately_. | ||
+ | cause problems are things like overflow, erronius multiplications, | ||
+ | Now, what about a 2D IDCT? The way a 2D IDCT is computed is by | ||
+ | first transforming in one direction (e.g. the x-direction), | ||
+ | in the other direction (e.g. the y-direction). | ||
+ | the 2D IDCT by first transforming each row, then transforming each column | ||
+ | (or vice versa). | ||
+ | |||
+ | Therefore, once the 1D IDCT works, so does the 2D IDCT. | ||
+ | |||
+ | So, to summarize: to test the routine completely we simply need | ||
+ | to test each component of a 1D IDCT separately, and compare the result | ||
+ | with the known answer. | ||
+ | And if you really want to test it on a 2D set of data, there is | ||
+ | an example DCT array given in the Wallace paper (and the result of the | ||
+ | inverse transform). | ||
+ | |||
+ | Quantization revisited | ||
+ | ---------------------- | ||
+ | |||
+ | The quantization step filters out all the small-amplitude frequencies. | ||
+ | A JPEG can have up to four quantization tables; each table is a 64-byte (8x8) | ||
+ | set of integers. | ||
+ | leaves an 8x8 block of amplitudes. | ||
+ | corresponding entry in the quantization table, thus filtering out the small | ||
+ | amplitudes in a weighted fashion. | ||
+ | re-ordered into a 64-byte vector which concentrates the lower frequences | ||
+ | (the ones more likely to be nonzero) at the beginning of the vector, and | ||
+ | the higher frequencies (more likely to be zero) at the end of the vector. | ||
+ | This last step (zig-zag reordering) clearly increases the efficiency of the | ||
+ | RLE encoding of the amplitude vector. | ||
+ | The decoder just reverses these steps -- it dequantizes the data | ||
+ | (i.e. multiplies by the quantization coefficients) and re-orders the data, | ||
+ | before performing the IDCT. Now, you may have noticed that the IDCT routine | ||
+ | has to prepare the coefficients by multiplying (dividing) by a set of | ||
+ | constants: | ||
+ | |||
+ | for (int i=0; i<8; i++) | ||
+ | F[i] = S[i]/ | ||
+ | |||
+ | (This is done because the algorithm is actually an adapted FFT routine). | ||
+ | In principle, this step can be incorporated into the de-quantization step, | ||
+ | since dequantization is also just multiplying by constants. | ||
+ | transform this is very straightforward, | ||
+ | to the 2D transform. | ||
+ | into the quantization such that, say, the row transforms will not need | ||
+ | preparation, | ||
+ | I did not feel that this was a very useful " | ||
+ | mention it here for completeness. | ||
+ | Note also that a wise programmer would replace the Math.cos | ||
+ | calls above with constants, if the code were to be actually used in | ||
+ | a decoder. | ||
+ | |||
+ | Miscellaneous | ||
+ | ------------- | ||
+ | |||
+ | You may recall that all JPEGs begin with FFD8, and JFIF files | ||
+ | immediately follow this with the FFE0 JFIF segment. | ||
+ | have the JFIF segment, some don' | ||
+ | not include a JFIF header. | ||
+ | worry about it. | ||
+ | Moreover, be sure to skip unknown segments using the segment | ||
+ | length byte -- as opposed to, say, moving forwards in the file until | ||
+ | another valid segment header is found. | ||
+ | When reading some of the other jpeg documentation, | ||
+ | that the byte $FF is a special byte, to be skipped (unless followed by $00). | ||
+ | Just to be clear, this only applies to the image data -- $FF is a normal | ||
+ | data byte within other segments. | ||
+ | within the image data. | ||
+ | |||
+ | C64 Implementation | ||
+ | ------------------ | ||
+ | |||
+ | As you probably understand by now, and as we shall see below, | ||
+ | jpegs on a C64 are far from being an impossible task. So to wrap up, | ||
+ | this section will cover the main issues in implementing a jpeg decoder | ||
+ | on a C64, and examine some of the comments regarding jpegs being | ||
+ | " | ||
+ | |||
+ | One frequently-heard comment was that a C64 doesn' | ||
+ | memory to decode a jpeg, so let's look at the numbers. | ||
+ | discussion, jpegs require memory for | ||
+ | |||
+ | 1 - Quantization tables | ||
+ | 2 - Huffman trees | ||
+ | 3 - Image data | ||
+ | |||
+ | The quantization tables are 64 bytes each, and there are a maximum of | ||
+ | four -- so, no big deal. Using the two-byte storage method, the Huffman | ||
+ | trees typically take up around 1.5k, and using the 5-byte method they take | ||
+ | on the order of 4k. The image data is stored in a jpeg on a row-by-row | ||
+ | basis, where each row is some multiple of 8 lines large. | ||
+ | display is 320 pixels wide, so that means an image buffer size of | ||
+ | 320x8 = $0A00 bytes per 8 scanlines. | ||
+ | So, a few K for the Huffman tables, and a few K for the image | ||
+ | buffers. | ||
+ | memory. | ||
+ | Now, as you may recall, the data decoded from a JPEG file is | ||
+ | luma/chroma data -- Y (intensity) CrCb (chroma). | ||
+ | all that is needed is the intensity -- there' | ||
+ | You may also recall that, because of sampling factors, a jpeg might decode | ||
+ | to 16x16 blocks of data (or more), which means several 320x8 image buffers | ||
+ | need to be available -- at $0A00 bytes/ | ||
+ | space available. | ||
+ | For a full-color picture, however, all three components need to | ||
+ | be kept, which means three buffers for each 320x8 row of data, which means | ||
+ | $1E00 bytes per row. So there' | ||
+ | Until, that is, you throw IFLI into the mix -- but more on this later. | ||
+ | The bottom line is that jpegs really don't require much memory. | ||
+ | |||
+ | Another common comment was that the C64 was far too slow to do | ||
+ | the necessary calculations, | ||
+ | As was stated earlier, the IDCT routine used in this program needs some | ||
+ | 29 adds and 13 multiplies to do a 1D transform. | ||
+ | " | ||
+ | can be implemented using tables. | ||
+ | 13 16-bit table-lookups for the IDCT, which is really pretty trivial. | ||
+ | Another important calculation is the dequantization, | ||
+ | doing 64 integer multiplications per 8x8 data block. | ||
+ | large (and the result can be 16-bits), and the multiplications are done | ||
+ | using the usual fast multiply routine (let f(x)=x^2/4, then | ||
+ | a*b = f(a+b)-f(a-b)), | ||
+ | Again, not a big deal. | ||
+ | So, in summary, the mathematical calculations are well within | ||
+ | the grasp of the 64. | ||
+ | In fact, all the routines are quite straightforward -- only the | ||
+ | IDCT routine is special. | ||
+ | |||
+ | One important issue is grayscale versus color. | ||
+ | released, jpx, is grayscale, and for several very good reasons. | ||
+ | is much faster to compute, since no RGB conversion needs to be done (the | ||
+ | intensity Y is exactly the grayscale levels). | ||
+ | since the color components may be thrown away, and the bitmap requirements | ||
+ | are modest. | ||
+ | With some pretty solid fundamental routines and a reasonable | ||
+ | grasp of the important issues, color was a reasonably straightforward | ||
+ | addition to the code, with just one problem: memory. | ||
+ | 32k for the bitmaps. | ||
+ | two image buffers are needed, for almost 16k. The RGB conversion code | ||
+ | uses table lookups. | ||
+ | The decoder needs memory for Huffman and quantization tables. | ||
+ | it all up, there just wasn't room. | ||
+ | With a little more thought and planning, though, a few things | ||
+ | became clear: first, IFLI doesn' | ||
+ | means the image buffers only need to be 296x8 x 3 components = $1BC0 bytes | ||
+ | (instead of 320x8 x 3). Typical jpegs use a maximum sampling factor of 2, | ||
+ | so using just two buffers requires $3780 bytes -- a savings of almost $0600 | ||
+ | bytes over a 320-pixel wide bitmap. | ||
+ | came out to almost exactly 16K per bitmap, which means that all the data can | ||
+ | be squished into the two IFLI bitmaps and sorted out later. | ||
+ | here and saving there, and economizing on tables and rearranging memory, we | ||
+ | were able to cram everything into 64k, with just a few hundred bytes to | ||
+ | spare -- pretty neat. | ||
+ | |||
+ | And that, I think, sums up JPEG decoding on a C64. | ||
+ | |||
+ | |||
+ | /* | ||
+ | * idct.java -- Attempts to implement the IDCT by reversing the flowgraph | ||
+ | * as given in Pennebaker & Mitchell, page 52. | ||
+ | * | ||
+ | * Almost there! | ||
+ | * | ||
+ | * SLJ 9/15/99 | ||
+ | */ | ||
+ | |||
+ | import java.lang.Math.*; | ||
+ | import java.io.*; | ||
+ | import java.util.*; | ||
+ | |||
+ | public class idct2d { | ||
+ | |||
+ | // a1=cos(2u), a2=cos(u)-cos(3u), | ||
+ | // where u = pi/8 | ||
+ | |||
+ | static double ang = Math.PI/8; | ||
+ | |||
+ | // static double a1=0.7071, a2= 0.541, a3=0.7071, a4=1.307, a5=0.383; | ||
+ | static double a1 = Math.cos(2.0*ang), | ||
+ | a2 = Math.cos(ang)-Math.cos(3.0*ang), | ||
+ | a3 = Math.cos(2.0*ang), | ||
+ | a4 = Math.cos(ang)+Math.cos(3.0*ang), | ||
+ | a5 = Math.cos(3.0*ang); | ||
+ | |||
+ | // static double[] f = {31, 41, 52, 65, 83, 15, 34, 117}, | ||
+ | static double[] f = {10, 9.24, 7.07, 3.826, 0, -3.826, -7.07, -9.24}, | ||
+ | F = {0, 0, 0, 0, 0, 0, 0, 256}, | ||
+ | S = {0, 0, 0, 0, 0, 0, 0, 256}; | ||
+ | |||
+ | static double[][] trans = new double[8][8]; | ||
+ | |||
+ | void idct2d() {} | ||
+ | |||
+ | void calcIdct() { | ||
+ | double t1, t2, t3, t4; | ||
+ | |||
+ | // Stage 1 | ||
+ | |||
+ | for (int i=0; i<8; i++) | ||
+ | F[i] = S[i]/ | ||
+ | |||
+ | F[0] = F[0]*2/ | ||
+ | |||
+ | t1 = F[5] - F[3]; | ||
+ | t2 = F[1] + F[7]; | ||
+ | t3 = F[1] - F[7]; | ||
+ | t4 = F[5] + F[3]; | ||
+ | F[5] = t1; | ||
+ | F[1] = t2; | ||
+ | F[7] = t3; | ||
+ | F[3] = t4; | ||
+ | |||
+ | //printF(); | ||
+ | |||
+ | // Stage 2 | ||
+ | |||
+ | t1 = F[2] - F[6]; | ||
+ | t2 = F[2] + F[6]; | ||
+ | F[2] = t1; | ||
+ | F[6] = t2; | ||
+ | |||
+ | t1 = F[1] - F[3]; | ||
+ | t2 = F[1] + F[3]; | ||
+ | F[1] = t1; | ||
+ | F[3] = t2; | ||
+ | |||
+ | //printF(); | ||
+ | |||
+ | // Stage 3 | ||
+ | |||
+ | F[2] = a1*F[2]; | ||
+ | |||
+ | t1 = -a5*(F[5] + F[7]); | ||
+ | F[5] = -a2*F[5] + t1; | ||
+ | F[1] = a3*F[1]; | ||
+ | F[7] = a4*F[7] + t1; | ||
+ | |||
+ | //printF(); | ||
+ | |||
+ | // Stage 4 | ||
+ | |||
+ | t1 = F[0] + F[4]; | ||
+ | t2 = F[0] - F[4]; | ||
+ | F[0] = t1; | ||
+ | F[4] = t2; | ||
+ | |||
+ | F[6] = F[2] + F[6]; | ||
+ | |||
+ | //printF(); | ||
+ | |||
+ | // Stage 5 | ||
+ | |||
+ | t1 = F[0] + F[6]; | ||
+ | t2 = F[2] + F[4]; | ||
+ | t3 = F[4] - F[2]; | ||
+ | t4 = F[0] - F[6]; | ||
+ | F[0] = t1; | ||
+ | F[4] = t2; | ||
+ | F[2] = t3; | ||
+ | F[6] = t4; | ||
+ | |||
+ | F[3] = F[3] + F[7]; | ||
+ | F[7] = F[7] + F[1]; | ||
+ | F[1] = F[1] - F[5]; | ||
+ | F[5] = -F[5]; | ||
+ | |||
+ | //printF(); | ||
+ | |||
+ | // Final stage | ||
+ | f[0] = (F[0] + F[3]); | ||
+ | f[1] = (F[4] + F[7]); | ||
+ | f[2] = (F[2] + F[1]); | ||
+ | f[3] = (F[6] + F[5]); | ||
+ | |||
+ | f[4] = (F[6] - F[5]); | ||
+ | f[5] = (F[2] - F[1]); | ||
+ | f[6] = (F[4] - F[7]); | ||
+ | f[7] = (F[0] - F[3]); | ||
+ | } | ||
+ | |||
+ | |||
+ | static public void main(String s[]) { | ||
+ | |||
+ | idct2d test = new idct2d(); | ||
+ | int i,j; | ||
+ | |||
+ | // Init to test transform in Wallace paper | ||
+ | for (i=0; i<8; i++) | ||
+ | for (j=0; j<8; j++) trans[i][j]=0; | ||
+ | trans[0][0] = 240; | ||
+ | trans[0][2] = -10; | ||
+ | trans[1][0] = -24; | ||
+ | trans[1][1] = -12; | ||
+ | trans[2][0] = -14; | ||
+ | trans[2][1] = -13; | ||
+ | |||
+ | //First the row transforms | ||
+ | for (i=0; i<8; i++) { | ||
+ | for (j=0; j<8; j++) S[j] = trans[i][j]; | ||
+ | test.calcIdct(); | ||
+ | for (j=0; j<8; j++) trans[i][j] = f[j]; | ||
+ | } | ||
+ | |||
+ | for (i=0; i<8; i++) { | ||
+ | System.out.println(); | ||
+ | for (j=0; j<8; j++) System.out.print((int) trans[i][j]+" | ||
+ | } | ||
+ | System.out.println(); | ||
+ | |||
+ | System.out.println(" | ||
+ | |||
+ | //Now the column transforms | ||
+ | for (i=0; i<8; i++) { | ||
+ | for (j=0; j<8; j++) S[j] = trans[j][i]; | ||
+ | test.calcIdct(); | ||
+ | for (j=0; j<8; j++) trans[j][i] = f[j]/4 + 128; | ||
+ | } | ||
+ | |||
+ | //Print it out! | ||
+ | for (i=0; i<8; i++) { | ||
+ | System.out.println(); | ||
+ | for (j=0; j<8; j++) System.out.print((int) (trans[i][j]+0.5)+" | ||
+ | } | ||
+ | System.out.println(); | ||
+ | } | ||
+ | } | ||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . C=H 19 | ||
+ | |||
+ | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
+ | |||
+ | ------- | ||
+ | Part II: Bringing "True Color" images to the 64 | ||
+ | ------- | ||
+ | by Adrian Gonzalez < | ||
+ | |||
+ | The Commodore 64 has a somewhat limited resolution, 16 predefined colors, | ||
+ | and some very peculiar restrictions as to the number of different colors | ||
+ | that can be placed next to each other. | ||
+ | colorful pictures on the 64 a difficult task, and displaying full color | ||
+ | photographic images almost impossible. | ||
+ | |||
+ | I've been fascinated with bringing full color images to the c64 for a long | ||
+ | time now. My first image conversion project was a C program that could | ||
+ | convert 16 color IFF pictures to koalapaint format. | ||
+ | project somewhere back in 1992 or so. It ran on the Amiga, and it was one | ||
+ | of my first ' | ||
+ | while doing it. After some time I rewrote the converter completely and | ||
+ | added support for Doodle, charsets and a few other things. | ||
+ | |||
+ | Shortly after and with the help of a few friends on the net, I learned about | ||
+ | a " | ||
+ | however, somebody on irc #c-64 pointed me to a couple of ' | ||
+ | available on an ftp site that were supposedly in some new, colorful vic | ||
+ | mode. I was reluctant because I thought I had seen the best graphics a c64 | ||
+ | could do. Boy was I wrong. | ||
+ | called IFLI. Shortly thereafter the thought of doing an IFLI converter grew | ||
+ | stronger and stronger in my head and the idea of a FLI converter practically | ||
+ | vanished. | ||
+ | at IFLI conversion. | ||
+ | this converter/ | ||
+ | JPEG decoder. | ||
+ | |||
+ | My involvement with the JPEG project kind of started before Steve even | ||
+ | started to work on it. About two years ago, Nate Dannenberg asked me to | ||
+ | do a renderer for his QuickCam interface. | ||
+ | renderer in 4 grays. | ||
+ | renderer that was used first for Nate's Quick cam, and later modified to | ||
+ | work with the first version of Steve' | ||
+ | was later hacked into rendering drazlace grayscale images. | ||
+ | |||
+ | The big challenge, of course, was porting the full color IFLI renderer to | ||
+ | the c64. I don't think I would' | ||
+ | We faced the obvious restriction of the c64's limited RAM (The IFLI image | ||
+ | itself takes up half the c64's memory!). | ||
+ | it worked out just fine. But how exactly does the renderer do it's magic? | ||
+ | What's all that garbage on the screen while it's rendering? | ||
+ | like to start off by giving a quick explanation of what dithering is, and | ||
+ | how the renderer uses a particular kind called Floyd-Steinberg dithering. | ||
+ | |||
+ | |||
+ | Floyd-Steinberg Dithering | ||
+ | ------------------------- | ||
+ | |||
+ | Dithering is the process of using patterns of two or more colors to | ||
+ | trick the eye into seing a different color. | ||
+ | display 3 shades of gray with just two colors, you could have dither | ||
+ | patterns such as: | ||
+ | |||
+ | . . . . * . * . * * * * | ||
+ | . . . . . * . * * * * * | ||
+ | . . . . * . * . * * * * | ||
+ | . . . . . * . * * * * * | ||
+ | . . . . * . * . * * * * | ||
+ | . . . . . * . * * * * * | ||
+ | |||
+ | Where the dots (.) are black pixels and the asterisks (*) are white | ||
+ | pixels. | ||
+ | pattern as a shade of gray. This is the basic concept behind dithering. | ||
+ | |||
+ | Floyd-Steinberg dithering is an 'error diffusion' | ||
+ | Basically that means that when drawing an image, if a color in the | ||
+ | source image can't be matched with the available colors we have to use | ||
+ | the closest available color. | ||
+ | difference between the color we wanted to use (source image color) and the | ||
+ | closest one we had available. | ||
+ | distributed (diffused) amongst adjacent pixels. | ||
+ | |||
+ | For example, imagine we have a video chip that can only display black and | ||
+ | white pixels. | ||
+ | brightness. | ||
+ | shades of gray. We can store the image as an array of numbers from 0 to 99, | ||
+ | where 0 represents 0% brightness and 99 represents 100% brightness. | ||
+ | part of our test image could look something like this (5 x 2 pixel chunk of | ||
+ | the image): | ||
+ | |||
+ | 00 25 45 75 99 | ||
+ | 30 50 80 30 10 | ||
+ | |||
+ | Without dithering, the best we could do is pick the color closest to the one | ||
+ | we want to display, so we'd end up with something like: | ||
+ | |||
+ | 00 00 00 99 99 | ||
+ | 00 99 99 00 00 | ||
+ | |||
+ | Where 00 is black and 99 is white. | ||
+ | greater or equal to 50 were converted to white (99) and the rest were | ||
+ | converted to black (00), since those are the only two colors our hypotetical | ||
+ | video chip can display. | ||
+ | |||
+ | With Floyd-Steinberg error diffusion dithering we also plot the closest | ||
+ | color we have, but instead of just moving on to the next pixel we calculate | ||
+ | by how much we were off (error) and diffuse that amount among adjacent pixels. | ||
+ | Going back to our test image, the first pixel is completely black so we can | ||
+ | display it right away without incurring any error, because we matched the | ||
+ | color exactly. | ||
+ | closest color we can, in this case, black (00). We then proceed to compute | ||
+ | the error, which is equal to the color we wanted (25) minus the color we | ||
+ | had available (00), so for this pixel, the error is +25. We then diffuse | ||
+ | the error (+25) to the adjacent pixels. | ||
+ | distribution: | ||
+ | |||
+ | | ||
+ | |||
+ | 1E/16 5E/16 3E/16 | ||
+ | |||
+ | Where C.Pix is the current pixel, and E is the error. | ||
+ | means, add seven sixteenths of the error to the pixel to the right of the | ||
+ | current pixel, five sixteenths of the error to the pixel below the current | ||
+ | pixel, etc. | ||
+ | |||
+ | So in our example, we wanted to plot a dark gray pixel (25) but we only | ||
+ | had black available (00), so the error is +25. So then we have (rounded | ||
+ | off) | ||
+ | |||
+ | (7/16)E = 11 | ||
+ | (5/16)E = 8 | ||
+ | (3/16)E = 5 | ||
+ | (1/16)E = 2 | ||
+ | |||
+ | When we add this to the original image buffer, we get: | ||
+ | |||
+ | (Original) | ||
+ | 00 CP >45< 75 100 | ||
+ | 25 50 80 30 10 | ||
+ | |||
+ | (Diffused) | ||
+ | 00 CP >56< 75 100 | ||
+ | 27 58 85 30 10 | ||
+ | |||
+ | Again, CP stands for ' | ||
+ | ready to move on to the next pixel. | ||
+ | (originally 45) would have been plotted as black but now, because of the | ||
+ | error diffusion, the new value is 56 so we'll plot it as white, and the | ||
+ | error will be 56-99 = -43. We then repeat the procedure: | ||
+ | |||
+ | (7/16)E = -19 | ||
+ | (5/16)E = -13 | ||
+ | etc | ||
+ | |||
+ | And adjust the buffer accordingly. | ||
+ | processing each scanline from left to right and scanlines from top to | ||
+ | bottom and the result is a nice looking dithered image. | ||
+ | can be positive or negative, so we should prepare for a case such as this: | ||
+ | |||
+ | 55 00 00 | ||
+ | 00 00 00 | ||
+ | |||
+ | Get the 55, plot it as white, and we have an error of -44, so that means | ||
+ | that our buffer needs to be able to handle negative values as well. After | ||
+ | difusing, the buffer would look like: | ||
+ | |||
+ | CP -20 00 | ||
+ | -14 -8 00 | ||
+ | |||
+ | Note also that the 1E/16 was discarded because we're at the left edge of | ||
+ | the screen. | ||
+ | |||
+ | | ||
+ | | ||
+ | |||
+ | The error +44 will make the values of adjacent pixels greater than 99, | ||
+ | which is the maximum that can be displayed. | ||
+ | hold values large enough to accomodate for this. | ||
+ | |||
+ | Now let's assume our hypothetical video chip manufacturer came up with a new | ||
+ | video chip that can display 4 grays: black (0), dark gray (33), light gray | ||
+ | (66), and white (99). If we want to plot an image with 100 shades of gray | ||
+ | we will still always plot the closest color we can, i.e. 0-16 will be | ||
+ | plotted as 0 (black), 17-49 as 33 (dark gray), etc. The error will be | ||
+ | positive or negative depending on whether we're under or over the color we | ||
+ | wanted to plot. For example, the color 15 would be plotted as 0 (black), | ||
+ | with an error of +15, while the color 20 would be plotted as 33 (dark gray) | ||
+ | with an error of -13. And I think I've managed to confuse everybody | ||
+ | including myself, but if you read this paragraph over, it should make at | ||
+ | least some sense. | ||
+ | want minus the color we have. | ||
+ | |||
+ | As if things weren' | ||
+ | (RGB) display where we have 3 buffers, one for each primary color (red green | ||
+ | and blue). | ||
+ | color for a given pixel. | ||
+ | colors are specified as triplets, for example: | ||
+ | |||
+ | | ||
+ | ( 0, 0, 0) black | ||
+ | ( 99, 0, 0) bright red | ||
+ | ( 99, 99, 0) bright yellow | ||
+ | ( 99, 99, 99) white | ||
+ | |||
+ | When we plot a color we now have to compute three errors, one for each | ||
+ | primary color component. | ||
+ | its corresponding buffer. | ||
+ | pixel (80, 0, 0) but our video chip can only display bright red (99, 20, 0). | ||
+ | The error would still be computed as the color we want minus the color we | ||
+ | can display: | ||
+ | |||
+ | We want: | ||
+ | r1=80, g1= 0, b1=0 | ||
+ | |||
+ | We have: | ||
+ | r2=99, g2=20, b2=0 | ||
+ | |||
+ | The error would be: (r1-r2, g1-g2, b1-b2) = (-19, -20, 0). After computing | ||
+ | the error we proceed to distrubute it in the same fashion as before, except | ||
+ | that we now have three image buffers, each with its own error to be | ||
+ | distributed among its adjacent pixels. | ||
+ | imagine you're displaying 3 independent images, each with it's own error. | ||
+ | In the previous example, we would diffuse the -19 in the red buffer, the -20 | ||
+ | in the green buffer and the 0 in the blue buffer. | ||
+ | |||
+ | With grayscale images, finding which shade of gray was the closest to the | ||
+ | one we wanted to display was pretty straightforward. | ||
+ | images, the way to figure out the closest color changes a little bit. In | ||
+ | order to find which of our available colors is the closest match for the | ||
+ | color we want to display, we need to calculate the ' | ||
+ | we want to each of the colors we have available and use the one with the | ||
+ | shortest distance. | ||
+ | cube, with the R, G, and B as each of the 3 axis. The origin (0,0,0) is | ||
+ | black, and the corner opposite to the origin (99,99,99) is white, so | ||
+ | figuring out the distance between two colors is as simple as figuring out | ||
+ | the distance between two points in space: | ||
+ | |||
+ | color1 = (r1, g1, b1) | ||
+ | color2 = (r2, g2, b2) | ||
+ | d = sqrt( (r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2 ) | ||
+ | |||
+ | Let's say that our video chip can display 5 colors: | ||
+ | and white. | ||
+ | |||
+ | ( 0, 0, 0): Black | ||
+ | (99, 0, 0): Red | ||
+ | ( 0,99, 0): Green | ||
+ | ( 0, 0,99): Blue | ||
+ | (99,99,99): White | ||
+ | |||
+ | Let's also say we want to find out which of these is the closest match for | ||
+ | the color (50, | ||
+ | and all of our 5 available colors and see which one is the closest. | ||
+ | calculations would be as follows: | ||
+ | |||
+ | Black: | ||
+ | sqrt( ( 0-50)^2 + ( 0-80)^2 + ( 0-10)^2 ) = 94.87 | ||
+ | |||
+ | Red: | ||
+ | sqrt( (99-50)^2 + ( 0-80)^2 + ( 0-10)^2 ) = 94.35 | ||
+ | |||
+ | Green: | ||
+ | sqrt( ( 0-50)^2 + (99-80)^2 + ( 0-10)^2 ) = 54.42 | ||
+ | |||
+ | Blue: | ||
+ | sqrt( ( 0-50)^2 + ( 0-80)^2 + (99-10)^2 ) = 129.70 | ||
+ | |||
+ | White: | ||
+ | sqrt( (99-50)^2 + (99-80)^2 + (99-10)^2 ) = 103.36 | ||
+ | |||
+ | In this case, the color with the shortest distance is Green (54.42). | ||
+ | that we're not interested in knowing the exact distance, just knowing which | ||
+ | color has the smallest distance, so it's safe to toss out the square root | ||
+ | in order to things faster. | ||
+ | with the following squared distances: | ||
+ | |||
+ | Black: | ||
+ | Red: 8901 | ||
+ | Green: | ||
+ | Blue: 16821 | ||
+ | White: 10683 | ||
+ | |||
+ | Of course, Green still has the smallest distance^2, and we're saved from | ||
+ | performing a potentially troublesome (and slow) calculation. | ||
+ | |||
+ | Based on the previous explanation, | ||
+ | Floyd-Steinberg dithering on the C64. We will need to have the RGB values | ||
+ | for each C64 color handy in order to be able to compute the error and the | ||
+ | closest colors for each pixel we want to draw. | ||
+ | |||
+ | This article would probably end at this point if the C64 would let us | ||
+ | choose any of the 16 colors for any pixel on the screen, but we're not quite | ||
+ | that lucky. | ||
+ | |||
+ | |||
+ | Multicolor Bitmap Mode | ||
+ | ---------------------- | ||
+ | |||
+ | The VIC-II video chip on the C64 has somewhat strict color limitations. | ||
+ | multicolor bitmap mode, the screen has a resolution of 160x200 and it's | ||
+ | divided into 4x8 pixel ' | ||
+ | different colors out of the C64's 16 colors plus one background color common | ||
+ | to all cells on the screen. | ||
+ | |||
+ | 4 4 4 3 | ||
+ | 4 4 3 3 | ||
+ | 4 3 3 3 | ||
+ | 3 3 3 0 | ||
+ | 3 3 3 0 | ||
+ | 1 3 3 3 | ||
+ | 1 1 3 3 | ||
+ | 1 1 1 3 | ||
+ | |||
+ | We could choose color 3 as the background color common to all cells, and the | ||
+ | colors 0, 1 and 4 as the colors available to this particular cell (called | ||
+ | foreground, multicolor 0, and multicolor 1). We can't display any | ||
+ | additional colors on this cell. This makes multicolor bitmap mode a very | ||
+ | tough choice for displaying true color images. | ||
+ | |||
+ | |||
+ | FLI Mode | ||
+ | -------- | ||
+ | |||
+ | Flexible Line Interpretation (FLI) mode is a software graphics mode in which | ||
+ | the video chip is tricked by software in order to achieve higher color | ||
+ | placement freedom. | ||
+ | except that each 4x8 cell is further divided into eight 4x1 cells. | ||
+ | 4x1 cell can have 2 completely independent colors, 1 color common to the | ||
+ | entire 4x8 cell and one background color common to the entire image (some | ||
+ | implementations of FLI change the background color on every scanline as well). | ||
+ | One small downside of FLI mode is that the leftmost 3 columns of cells are | ||
+ | lost due to the trickery used to get the video chip to fetch color data on | ||
+ | every scanline. | ||
+ | 160x200 to 148x200. | ||
+ | |||
+ | |||
+ | IFLI Mode | ||
+ | --------- | ||
+ | |||
+ | IFLI mode or " | ||
+ | rapidly. | ||
+ | for NTSC models and 50 frames per second for PAL models. | ||
+ | the screen is redrawn 60 times per second on NTSC units and 50 times per | ||
+ | second on PAL units. | ||
+ | each for 1/60th of a second (1/50th for PAL), giving the illusion of a | ||
+ | single blended image with more than 16 colors. | ||
+ | advantages of IFLI mode is that one of the FLI images is shifted one hires | ||
+ | pixel (1/2 of a multicolor pixel) to the right to give a pseudo 320x200 | ||
+ | hires effect. | ||
+ | |||
+ | For example, let's say a little part of the images looks like this: | ||
+ | (11 = one multicolor white pixel, 33 = one multicolor cyan pixel, etc) | ||
+ | |||
+ | Image1 | ||
+ | 11335577 | ||
+ | |||
+ | Image2 | ||
+ | | ||
+ | |||
+ | Alternating these two would give an effect that looks like: | ||
+ | 12345678 | ||
+ | |||
+ | Except that the colors would also mix and blur slightly, giving the illusion | ||
+ | of more colors than the VIC-II can actually display. | ||
+ | combinations work better than others. | ||
+ | and get a nice looking shade of gray (you' | ||
+ | gray because of the alternation). | ||
+ | |||
+ | The renderer in jpz doesn' | ||
+ | never happy with the results I got by doing that. Instead, it treats the | ||
+ | IFLI display as a ' | ||
+ | one of the c64's 16 colors in any position. | ||
+ | also applies to IFLI, so the resolution is 296x200 instead of 320x200. | ||
+ | |||
+ | The color restrictions are somewhat more complex in IFLI mode. The renderer | ||
+ | in jpz treats the display as if it was made up of 8x8 cells, with each cell | ||
+ | divided into eight 8x1 cells, and each of those divided into two 4x1 cells | ||
+ | (fun, huh? | ||
+ | sample: | ||
+ | |||
+ | A I A I A I A I | ||
+ | B J B J B J B J | ||
+ | C K C K C K C K | ||
+ | D L D L D L D L | ||
+ | E M E M E M E M | ||
+ | F N F N F N F N | ||
+ | G O G O G O G O | ||
+ | H P H P H P H P | ||
+ | |||
+ | The odd columns belong to a 4x8 cell in the first FLI image and the even | ||
+ | columns belong to a 4x8 cell in the second FLI image like this: | ||
+ | |||
+ | Image 1 Image 2 | ||
+ | AAAA IIII | ||
+ | BBBB JJJJ | ||
+ | CCCC KKKK | ||
+ | DDDD LLLL | ||
+ | EEEE MMMM | ||
+ | FFFF NNNN | ||
+ | GGGG OOOO | ||
+ | HHHH PPPP | ||
+ | |||
+ | Remember the two images are offset by half a multicolor pixel to give the | ||
+ | pseudo-hires effect. | ||
+ | image has 2 completely independent colors, but each 8x8 cell (the | ||
+ | combination of the 4x8 cells from the two images) shares one color, and the | ||
+ | entire image shares one background color. | ||
+ | |||
+ | The renderer in jpz is divided into two parts. | ||
+ | source RGB image and remaps it to the c64's colors, using Floyd-Steinberg | ||
+ | dithering as described in the first part of this article. | ||
+ | an array of numbers, each number corresponds to a c64 color. | ||
+ | part of the renderer takes this array of c64 colors and displays it in IFLI | ||
+ | mode as best as it can, taking into consideration the color placement | ||
+ | limitations mentioned above. | ||
+ | |||
+ | The second part of the renderer works with blocks of 8x8 pixels and follows | ||
+ | these steps: | ||
+ | |||
+ | 1) Choose one color as common to the entire 8x8 cell | ||
+ | 2) Choose two colors for each 4x1 cell | ||
+ | 3) Render the 8x8 block (as two 4x8 cells, one on each FLI image) | ||
+ | |||
+ | In step one the renderer has to determine which one of the C64's 16 color | ||
+ | would be the most helpful when chosen as common to the 8x8 block. | ||
+ | means that the common block color should be chosen to aid in 4x1 cells with | ||
+ | more than 2 different colors (remember that 4x1 cells only have 2 completely | ||
+ | independent colors for them). | ||
+ | |||
+ | 1 15 12 12 | ||
+ | |||
+ | We have two independent colors for the cell, which could be chosen as 1 and | ||
+ | 15. We need either the common 8x8 block color or the background color to be | ||
+ | 12 so we can correctly display this 4x1 cell. So how do we decide? | ||
+ | create a histogram! | ||
+ | |||
+ | A histogram is nothing more than a count of how many pixels of each color we | ||
+ | have in a particular area (in this case an 8x8 block). | ||
+ | want to count the cases in which the common block color would actually be | ||
+ | helpful for displaying a particular 4x1 cell. This is easier to explain | ||
+ | with an example 8x8 block: | ||
+ | |||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 1 1 1 1 1 1 1 1 | ||
+ | 2 1 3 1 3 1 4 1 | ||
+ | |||
+ | If we count all the colors in this block we would find 60 ones, one 2, two | ||
+ | 3's, and one 4, and we would decide that 1 is the best choice as a common | ||
+ | color for the 8x8 block because it's the most ' | ||
+ | look reveals that this block will be rendered as the following 4x8 blocks: | ||
+ | |||
+ | Image1 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 1111 1111 | ||
+ | 2334 1111 | ||
+ | |||
+ | Note that in the last 4x1 cell of image 1 we have 3 different colors. | ||
+ | have the ability to choose only two individual colors for this 4x1 cell, so | ||
+ | if we choose 2 and 3, we won't be able to display 4 and our common 8x8 block | ||
+ | color can't help us either. | ||
+ | count 4x1 cells with 2 or fewer different colors. | ||
+ | cell we would count in our histogram is the last 4x1 cell in image 1. So | ||
+ | the new histogram would be one 2, two 3's, and one 4. We would proceed to | ||
+ | choose 3 as the common 8x8 block color and this allows us to render the | ||
+ | entire 8x8 block without a single problem! | ||
+ | |||
+ | In theory, the same should be done for the background color, in order to | ||
+ | choose the best background color for the picture we're rendering, but that | ||
+ | would mean that we have to do a histogram for the entire image before | ||
+ | starting to render it. In practice, we don't have enough memory on the C64 | ||
+ | to do this while reserving enough memory for an IFLI display (and decoding a | ||
+ | JPEG), so we choose black as the default background color. | ||
+ | |||
+ | The second step in the process is to choose two colors for each 4x1 cell. | ||
+ | This is done with the same histogram technique described earlier, except we | ||
+ | have to take into consideration the color we picked as common to the entire | ||
+ | 8x8 block so we don't repeat any colors and have the best chances of | ||
+ | representing the original image as closely as possible. | ||
+ | histogram is made for each 4x1 cell, and the top two most popular colors are | ||
+ | picked, assuming they' | ||
+ | common 8x8 block color. | ||
+ | white (1) and we have a 4x1 cell that looks like this: | ||
+ | |||
+ | 1223 | ||
+ | |||
+ | The histogram would be: two pixels of color 2 (red), one pixel of color 1 | ||
+ | (white) and one pixel of color 3 (cyan). | ||
+ | already our common 8x8 block color, we skip it and pick colors 2 and 3 as | ||
+ | our 4x1 cell colors. | ||
+ | black is already available as the background color. | ||
+ | |||
+ | The third and last step is to render the actual image with the correct | ||
+ | bitpairs. | ||
+ | resolution in favor of more colors. | ||
+ | possible combinations: | ||
+ | |||
+ | 00: Background color (black in our case) | ||
+ | 01: Upper nybble of screen memory | ||
+ | 10: Lower nybble of screen memory | ||
+ | 11: Video matrix color nybble (Common 8x8 block color) | ||
+ | |||
+ | All that's left to do is to output the corresponding bit pairs in each 4x1 | ||
+ | cell to match the colors in the source (remapped) image as close as | ||
+ | possible. | ||
+ | |||
+ | Depending on the complexity of the source image, there can be a few or a | ||
+ | lot of 4x1 cells where we can't match all the colors. | ||
+ | 2 completely independent colors for each 4x1 cell, and a cell can | ||
+ | potentially have each pixel be a different color. | ||
+ | best we can do is approximate the colors we can't match with the ones we | ||
+ | have available. | ||
+ | to avoid having to compute the color distances in realtime. | ||
+ | |||
+ | The table is basically a list of what colors are most similar to any | ||
+ | particular c64 color, ordered from the most similar to the least. | ||
+ | we want to plot the color white (1) but none of our bitpairs for the current | ||
+ | cell can represent it. We have to look up white in our table and get the | ||
+ | first color closest to it. If that color isn't available either, we will | ||
+ | fetch the next closest color from the table and try again untill we find a | ||
+ | match. | ||
+ | |||
+ | It is worth mentioning that due to the memory limitations of the C64 the | ||
+ | bitmaps are stored in memory in ' | ||
+ | back to the brief description of FLI mode, you'll remember that the leftmost | ||
+ | 3 char columns were lost due to VIC chip limitations. | ||
+ | bitmaps are stored contiguously in memory, without these 3 char block gaps | ||
+ | in order to have enough room to render the entire image. | ||
+ | image is rendered, it is ' | ||
+ | displayed in its full IFLI glory. | ||
+ | can see this ' | ||
+ | Also, the colorful blocks on the screen while the image is being rendered | ||
+ | are the actual buffers where the floyd-steinberg dithering is taking place | ||
+ | (note that all of this is invisible in the SCPU version due to the memory | ||
+ | mirroring optimizations provided by the hardware). | ||
+ | |||
+ | Well, that basically wraps up this article. | ||
+ | reader an idea of the enormous amount of calculations that have to take place | ||
+ | in order to be able to convert the images to a format suitable for viewing | ||
+ | on our beloved C64. I also hope it explains the basic principles behind the | ||
+ | rendering of these images, and why it takes so long for a stock system to | ||
+ | display them. | ||
+ | ....... | ||
+ | .... | ||
+ | .. | ||
+ | . - fin - | ||
+ | </ |
magazines/chacking19.txt · Last modified: 2015-04-17 04:34 by 127.0.0.1