The MOS 6502 and the Best Layout Guy in the World

commandar · on Jan 3, 2011

>The most amazing part about the whole process is that they got the 6502 right in one try. Quoting On the Edge: Bil Herd summarizes the situation. “No chip worked the first time,” he states emphatically. “No chip. It took seven or nine revs [revisions], or if someone was real good they would get it in five or six.”

In some ways (and I'm speaking in a general sense) situations like that actually make me more nervous than when I know there's a problem. I get this uneasy "there's no way it really went that smoothly" feeling that can be hard to shake.

Then again, my personality is to approach most things in life iteratively, so that probably plays a part as well. Great read either way.

rbanffy · on Jan 3, 2011

That's the sensation of working with someone who's incredibly good.

And, BTW, the 6502 was a work of art. Simple, elegant and fast (even at 1 MHz), it ran rings around the Z-80's you found in more expensive computers of the time. Plus, it was delightful to program.

pvg · on Jan 3, 2011

I'm curious why you think so (about the Z80, I mean). I seem to remember it being in very cheap computers as well (ZX Spectrum, doesn't get much cheaper than that) and was a great deal faster. I grew up with a 6502 and love it like a childhood pet but can't see how it 'ran rings' around the Z80.

leoc · on Jan 4, 2011

In his talk http://www.youtube.com/watch?v=HW9AWBFH1sA#t=3m01s Michael Steil claimed that the 6502 had 60% fewer transistors than the Z80 but was twice as fast (in clock terms, I presume), while the Z80 had more registers and allowed slighly denser code.

pvg · on Jan 4, 2011

I'll take a look at that, thanks. The typical Z80 had ran 4 times the clock rate at the 6502. Both CPUs had their pluses and minuses and I think for many practical purposes were roughly comparable. The design differences and choices are certainly interesting. I don't think the OP was right to claim the 6502 'ran rings' around the Z80 and that remains the case, after all the comments I got.

rbanffy · on Jan 4, 2011

> while the Z80 had more registers

To somewhat counter that, the 6502 could read and write to the first 256 bytes of memory with shorter instructions. The 65816 expanded that idea to allow you to do that to any place in memory.

leoc · on Jan 4, 2011

The Western Design Centre actually describes the 65xx as an addressible register architecture http://www.westerndesigncenter.com/wdc/Presentation_Artwork/... [zipped PPT], and by that I assume they're describing the zero-page addresses as registers.

mgedmin · on Jan 4, 2011

Specifically, the talk mentions that the 6502 was pipelined and could do many instructions in 2 clock cycles, while the Z80 needed at least 4.

This translates to 2x the speed at the same clock rate.

Flow · on Jan 4, 2011

The 6510(6502 plus IO-ports) could do no such thing. Fastest instructions used 2 cycles and 1 byte op-code.

rbanffy · on Jan 4, 2011

You use two cycles, but the 6502 could execute the instruction while fetching the next one.

Flow · on Jan 4, 2011

That's not the pattern I see when looking at the op-code/cycle chart. I recently implemented part of a C64-emulator in JavaScript and it seems very much like every step takes a cycle.

For example the instructions NOP(or CLI, STI, INX etc), 1 byte, 2 cycles. 1 cycle for fetching the instruction and one for executing the fetched instruction.

LDA addr,x seems to be pipelined a bit though. It's "AD lo hi" in memory and takes 4 cycles unless lo+x > 255, then it takes 5 cycles. The lo+x calculation seems to occur while hi is being read.

rbanffy · on Jan 4, 2011

I will have to dig up my 6502 documentation, but, IIRC, by the time the processor executed the NOP (CLI, INX etc) it already fetched the next instruction, so, if it's another NOP, it will complete in one cycle instead of two. Unless you crossed a page boundary, which implies a one-cycle penalty.

Flow · on Jan 4, 2011

I see, but that's not how it worked on the C64 at least. I did some raster-programming and counted cycles a lot.

rbanffy · on Jan 4, 2011

Since I never wrote timing-critical code for the 6502 (apart from "make it as fast as possible") I cannot recall many specifics. Since you did, you certainly have a better understanding of how it worked.

I am restoring a 65c02-based //e clone, so, I may be able to properly measure instruction timings, but I won't hold my breath.

leoc · on Jan 4, 2011

It seems that all the mysteries of 6502 timing have been revealed thanks to the Visual 6502 project http://www.youtube.com/watch?v=H_15RtVbqGU#t=5m33s http://www.visual6502.org/ .

Flow · on Jan 4, 2011

Ye well, it could be C64-specific quirks since it shared the bus with the graphics hardware.

Sounds like a fun project.

exception · on Jan 4, 2011

Yes you are right, the instruction timings were very exact as far as I remember. The only cases where there was an option was in the case of a branch taken or not.

jonsen · on Jan 3, 2011

I believe the 6502 could do things faster than the Z80. I once implemented the exact same program on a 6800 and a Z80. The 6800 is very close to the 6502. Speed wasn't an issue, but the Z80 version was noticeably bigger. And also felt more quirky to program.

pvg · on Jan 3, 2011

I believe the 6502 could do things faster than the Z80.

Right, I was hoping someone might remember specifics. I don't think this is really true, in general.

The 6800 is very close to the 6502

Not really.

forinti · on Jan 4, 2011

The 6502 was faster at accessing memory. I remeber Elite running a lot better on a 2MHz BBC Micro than on any 3.58MHz Z80 based micro, like the MSXs.

rbanffy · on Jan 4, 2011

The MSX 1 also had a hideous design flaw (for a gaming machine) that hid the framebuffer from the processor. IIRC, you had to do a couple IOs to get a single byte to or from the VRAM.

wladimir · on Jan 5, 2011

I remember the MSX2 had exactly the same flaw, it had a bigger video memory and some 2D acceleration primitives, but the access method was the same, through vpoke/vpeek.

amichail · on Jan 3, 2011

Plus, it was delightful to program.

Simple maybe, but not delightful to program. I don't know of any assembly language that is delightful to program.

BTW, I knew someone from junior high & high school who could write code for the 6502 using a hex dump with amazing speed. You might have heard of him: Randy Linden.

bensummers · on Jan 3, 2011

I've written a lot of ARM assembler, and I think that could be described as a delight to program. An orthogonal instruction set, consistent naming, conditionals on every instruction, and a decent number of registers make it really quite pleasant (as these things go).

These days I'm writing a lot of code in Ruby and JavaScript on the JVM, and find myself considering Java my current equivalent of assembly. It's sad really, being so far away from what the CPU is actually executing.

biot · on Jan 4, 2011

6502 assembly was the first language after Apple Basic that I learned. I found it to be a very enjoyable experience. These days it would be tedious, but back then it was fun making the limited hardware do something and the only thing you used is what the hardware gave you. Today we have it lucky... with all the frameworks, libraries, and such only a small percentage of the end result is your original work. That's great for efficiency, though the downside is you can't point to a program and say that you wrote 100% of it.

I imagine this is similar to how someone who likes to tinker with engines might prefer an old motorcycle they can take apart and put back together themselves even though there are many advantages in terms of reliability, efficiency, comfort, and so on to driving a Prius.

rsc · on Jan 3, 2011

As Doug McIlroy (inventor of pipes, diff) once said of punched cards, "It's the kind of thing you can be nostalgic about, but it wasn't actually fun."

http://research.swtch.com/2008/04/computing-history-at-bell-...

HeyLaughingBoy · on Jan 3, 2011

Punch cards are different: there's substantially delayed gratification.

When you're a poor college student as I was, entering opcodes in a monitor/debugger you wrote in BASIC because you couldn't afford an assembler, then yes, assembly was fun.

Fun, that is, until you had to hand-calculate negative jump offsets. Don't remember why, but for some reason I seem to think that the MC6809 I was running on made it difficult to do so.

ghshephard · on Jan 4, 2011

There must have been something special about the 6809 - I recall doing my Digital Design course on that Chip. And yes, we didn't have an assembler, everything was entered via opcodes either. It was a very enjoyable experience for someone like me who wasn't a gear head, and got to play around with SB555s, NAND Gates, and lots, and lots of wirewrapping.

exception · on Jan 4, 2011

Heh, yeah I still remember 6502 opcodes... 4C xx xx is jmp absolute, 20 xx xx is jsr (jump subroutine) absolute, A9 xx xx is load accumulator absolute, 8D store accumulator, 60 is rts... I could go on.

Typing shit in to the hex monitor for a few years will do that :)

Someone · on Jan 4, 2011

A9 xx xx is load accumulator absolute

Make that A9 xx; The 6502 accumulator is 8 bits.

exception · on Jan 4, 2011

Thanks for the correction, you're right A9 is load immediate I was mixing it up with AD xx xx which follows as 8D is the equivalent store.

I wasn't saying that the accumulator was 16 bit though, rather that the load is from an absolute 16 bit address.

jonsen · on Jan 3, 2011

Studying the manufacturers data sheet for the chip you could in a short time know everything, and I mean everything, about the instruction set. You would never have any doubt about your programming language. That as least was some form of delight.

rsc · on Jan 3, 2011

Watch the video, especially the second half. There's a lot about the 6502 that wasn't apparent from the data sheet.

Luc · on Jan 3, 2011

There's a certain delight in getting the most out of such a limited set of instructions and registers, counting the clock-cycles, reducing the bytes used. It's much more fun than assembly on modern processors!

sfphotoarts · on Jan 3, 2011

I found the z80 delightful to work with, especially in comparison with the 6502 because it had twice as many registers and you could switch banks of them on the z80.

you have to love programming to really appreciate assembly, the succinctness, the satisfaction of being so close to the metal, none of this high level peek and poke nonsense of high level languages :)

you don't write code using a hex dump???

You write assembly code using either hand assembly or more efficiently using a tool called an assembler which makes the tedious task of mapping instructions by name to their hex values easier. After a while you get to remember the common ones though, C9 for example.

And I haven't heard of your school buddy, sorry.

amichail · on Jan 3, 2011

He wrote code with amazing speed using the machine language monitor on the PET -- without an assembler.

linker3000 · on Jan 3, 2011

Yep, I can remember doing that on a CBM 3016 in secondary school - must have been about 1979. A little later this 'thing' arrived that comprised a keyboard in a large, plain grey case with a micro-cassette drive, permanently hooked up to a domestic TV by a large umbilical cord. A group of us were given some information booklets and told to 'see what we could do with it'. We later found out it was a prototype for the BBC micro and further code updates allowed us to select teletext pages on the TV and go to 'special' pages that downloaded code to the micro-cassette. We wrote a lot of simple games and apps for the beast.

protomyth · on Jan 3, 2011

I had a lot of fun with the 6502 and I really liked the 6809, so I guess they felt delightful to me. It is a different type of mindset and can really make you focus on the problem at hand. The IBM 370 was a might bit of a pain though.

rbanffy · on Jan 4, 2011

One of the delightful things in programming a 6502-based computer is the relative importance of the OS. In those times, you had complete control over the machine and you called the OS (or the ROM routines) to do whatever you wanted it to.

I remember that when I had to do floating-point math or calculate a screen address (something convoluted on an Apple II) I would simply call a subroutine in ROM and pick up the results.

wglb · on Jan 9, 2011

Perhaps your experience is limited.

The PDP 10 had a delightful instruction set. While I didn't use the 7094 instruction set, many seemed to feel that it was a good set, and felt let down by the 360 instruction set when it came out.

The motorolla 6809 was nice, and it seemed to have a flavor of the pdp-11 (which itself was a delightfully simple and elegant set).

wallflower · on Jan 4, 2011

From Jordan Mechner's diary of the development of Prince of Persia (POP was originally coded in 6502 Assembler. It took him four years). Reading Jordan's full diary will take you at least eight hours but it is well worth it.

> We chatted for an hour about peripherally related topics. Broderbund, corporate America, the rat race, capitalism, freedom. I was seducing him.

At the critical psychological moment, I remarked: "You know, all my clipping is done on the byte boundaries."

There was a pause

http://jordanmechner.com/old-journals/page/33/

April 3, 1989

Luyt · on Jan 4, 2011

The whole 'Reverse Engineering the 6502' talk Michael Steil gave at CCC congress is on YouTube. I posted this earlier in a separate topic, but it didn't pick up.

Clickable links to the 6 parts:

http://www.youtube.com/watch?v=HW9AWBFH1sA

http://www.youtube.com/watch?v=bBE4KHKzhKc

http://www.youtube.com/watch?v=tRBo7O_blVo

http://www.youtube.com/watch?v=H_15RtVbqGU

http://www.youtube.com/watch?v=N9DYmlprCKA

http://www.youtube.com/watch?v=eZOUuqc4pk8

mmphosis · on Jan 3, 2011

Intel Core 2 - Yorkfield, 45 nm process technology, Number of Transistors: 820 Million

MOS 6502, Number of Transistors: 3510

So in theory, a chip with 65536 MOS 6502 cores each with 64K of internal RAM (4Mb cache) could be made.

meastham · on Jan 3, 2011

Sure, if you completely ignore all of the extra circuitry for the on-chip network and cache coherency and everything else you would need. Transistor density is far from the limiting factor in the number of cores we can wedge into a single system.

jdeeny · on Jan 3, 2011

If you assume that there is no cache (only on-die memory) and that memory is not shared between cores, things become much simpler and scale more linearly. Core-to-core communications and plenty of other details remain to spend man-years ironing out, but it seems like it would be possible to approach 64k cores or at least 16k.

RodgerTheGreat · on Jan 4, 2011

Might be able to peel out the BCD stuff from the 6502 to free up a little additional space and approach communication between cores kinda like the "handshake bus" GreenArrays chips use: http://greenarraychips.com/home/documents/greg/PB003-100822-...

exception · on Jan 3, 2011

I loved the 6502. Around that era I programmed the SC/MP, Z80, the 8080 and the 6800. Although the Z80 was more powerful, the 6502 holds a special place in my heart as it was the first CPU I worked with and I loved the simplicity of the instruction set.

My crowning achievement was a multi-threaded kernel for a CNC punch. Since the stack was at a fixed memory address and there was no PUSHA, I had to change threads (in response to an IRQ) by sequentially pushing the registers on to the stack and then swapping the stack with a block copy. It worked! Crazy :/

I loved reading this article - thanks for posting. Awesome stuff! Makes me want to code my own circuit emulator :)

thinkingeric · on Jan 4, 2011

Ditto that. My first programs were in assembly on the 6502, and I'm thrilled to see it getting this attention. I'm just sorry that I don't still have the KIM-1.

greggraham · on Jan 4, 2011

I was planning on buying a KIM-1 when my dad surprised me by buying an Apple II. I didn't end up writing anything in assembly on the Apple II, though. My first assembly language was IBM-370 in college. I wish now I had started with the KIM-1, though.

VMG · on Jan 3, 2011

Here it is in all its javascript goodness: http://www.visual6502.org/JSSim/index.html

eru · on Jan 3, 2011

The CCC congress yielded some great talks this year.