MMIX 2009: A RISC computer for the third millennium (2011)

plinkplonk · on June 9, 2017

" The original MIX computer ran fine without an operating system. You could bootstrap it with punched cards or paper tape and do everything yourself. But nowadays such power is no longer in the hands of ordinary users. The MMIX hardware, like all other computing machines made today, relies on an operating system to get jobs started in their own address spaces and to provide I/O capabilities.

Whenever anybody has asked if I will be writing a book about operating systems, my reply has always been ``Nix.'' Therefore the name of MMIX's operating system, NNIX, should come as no surprise.

From time to time I will necessarily have to refer to things that NNIX does for its users, but I have no intention of building NNIX myself. Life is too short. It would be wonderful if some expert in operating system design became inspired to write a book that explains exactly how to construct a nice, clean NNIX kernel for an MMIX chip. The MMIX fascicle includes a level-50 exercise that asks a highly motivated reader to ``Write a book about operating systems, which includes a complete design of an NNIX kernel for the MMIX architecture.'' Other alternatives are also possible; for example, somebody at the Free Software Foundation might decide to come up with an alternative system called GNNIX. "

masklinn · on June 9, 2017

> " The original MIX computer ran fine without an operating system. You could bootstrap it with punched cards or paper tape and do everything yourself. But nowadays such power is no longer in the hands of ordinary users. The MMIX hardware, like all other computing machines made today, relies on an operating system to get jobs started in their own address spaces and to provide I/O capabilities.

Unikernels come back to that, to an extent, what there is of the OS is part of the application, and IO capabilities are provided by libraries linked into the unikernel application. As always, the issue is having IO drivers for the hardware in a form which can be used.

andrewbinstock · on June 9, 2017

If you're interested in designing a RISC processor in software, Knuth's full book on MMIX is an interesting and complete read. I am not referring to the short version published as part of Volume 1 the revised TAOCP, but the full exposition in his book from Springer-Verlang. [1]

There he explains the CPU, the memory management, the instruction pipeline. And he builds the assembler for the machine. He worked with David Patterson (of MIPS fame) on the architecture.

As to the implementation, Knuth once commented that the coding of the instruction pipeline for MMIX was the hardest programming he'd ever done.

Cool stuff!

[1] http://www.springer.com/gb/book/9783540669388

faragon · on June 9, 2017

The 2009 figure is not the year, but a result (the PDF [1] has the Copyright set in 1999)

(Cray I + IBM 801 + RISC II + Clipper C300 + AMD 29K + Motorola 88K + IBM 601 + Intel i960 + Alpha 21164 + POWER 2 + MIPS R4000 + Hitahi SuperH4 + StrongARM 110 + Sparc 64) / 14 = 28126 / 14 = 2009

[1] http://mmix.cs.hm.edu/doc/fasc1.pdf#page=7&zoom=auto,-250,56...

Jabbles · on June 9, 2017

Read the next sentence :)

technomalogical · on June 9, 2017

TIL deep linking URL syntax for PDFs in Chrome, thanks!

faragon · on June 10, 2017

"The same number may also be obtained in a simpler way by taking Roman numerals."

unwind · on June 9, 2017

The names of the machine's primitive data types are interesting, and refreshing if you're tired of the more well-known ones. These are the load instructions, for reading data from memory:

    Load a Byte           LDB $X,$Y,$Z
    Load a Wyde (2Byte)   LDW $X,$Y,$Z
    Load a Tetra (4Byte)  LDT $X,$Y,$Z
    Load an Octa (8Byte)  LDO $X,$Y,$Z

I thought the "wyde" in the instruction set reference was a typo, until I saw that it was used all over the place. :)

I guess it's better (less ambiguous) than "word", "long", "double word", and so on.

cjdrake · on June 9, 2017

"Two bytes makes one wyde."

Can't remember where I heard/read that, but it's classic Knuth humor :).

kazinator · on June 9, 2017

See:

   SYNC   SWYM

in last opcode row. :)

wolfgke · on June 9, 2017

And

   GET TRIP

next. :-)

du_bing · on June 9, 2017

I find Knuth's idea fascinating. The fashionable languages come and go, but facts about how machine works does not change much, probably will not in the future.

userbinator · on June 9, 2017

I would disagree: just compare MIX with MMIX.

MIX is a 6-bit/2-digit hybrid binary/decimal sign-magnitude machine, typical of 50s-60s machines. MMIX is a 64-bit binary big-endian RISC with 256 windowed registers that resembles a hybrid of SPARC and MIPS, with all the traits of that 90s "RISC dream" that never really went anywhere.

If he were to design it today, it would probably look more like a mix of x86 and ARM.

masklinn · on June 9, 2017

> If he were to design it today, it would probably look more like a mix of x86 and ARM.

Why bother basing a new architecture on x86 when it's generally considered the worst ISA in history (with some boosting TI's TMS9900 instead)?

x86's advantage is basically its continuous dominance and compatibility with itself which managed to thwart even its original designer's attempt to displace it[0], but you won't get that by basing a new incompatible ISA upon it, so there's really no point in doing so.

[0] Linus made that point reasonably clearly in a 2016 PCWorld interview: http://www.pcworld.com/article/3129300/linux/why-linux-pione...

> What matters is all the infrastructure around the instruction set, and x86 has all that infrastructure... at a lot of different levels

> I’ve been personally pretty disappointed with ARM as a hardware platform, not as an instruction set […] As a hardware platform, it is still not very pleasant to deal with.

And this is a lesson he learned: while originally interested in the Acorn Archimedes (which wasn't a great success) he went with the Sinclair QL (which outright failed) and…

> Finland wasn’t the center of the universe back then, after that, I learned my lesson: never ever go buy into something that doesn’t have infrastructure.

CalChris · on June 9, 2017

If I were designing something today, something clean, I'd give a really good look at just cleaning up x86_64. AMD64 did some of that.

What is RISC-V but a cleaned up RISC, omitting the architectural mistakes of previous RISCs (branch delay slot (MIPS), register windows (RISC-1), tagged integers (SPARC), ...). It has a place.

A cleaned up x86 would definitely have a place. The idea of parsing a CISC instruction and translating+caching the result as a micro-op has worked, flat out worked. Worst ISA in history? I think not. Most successful ISA in history.

Still, Intel really needs to do what ARM has done. Clean up their act. Itanium was changing their act. It kinda reminds me of the scene in Spinal Tap where they introduced the new direction. Intel just needs to get rid of their cruft, seriously needs to get rid of their cruft because ARM is a threat and ARMv8 is a weapon. The x86 tax helps prevent competitors in their space (as per their threats to Microsoft) but ARM isn't in their space; they're next door.

kruhft · on June 9, 2017

And we have all the source code for the optimized algorithms...every...single...one.

https://www.amazon.ca/Computer-Programming-Volumes-1-4A-Boxe....

meanderer · on June 9, 2017

Quantum computers could work in very different ways, though.

soberhoff · on June 9, 2017

So do cars. Neither have any bearing on TAOCP.

harry8 · on June 9, 2017

Cars don't do much in the way of general purpose data transforms for you.

In my day we had computational complexity, and we LiKED it! So say all of us who claim Knuth as our homeboy if quantum computers ever work better switched on than off...

lokedhs · on June 10, 2017

> Cars don't do much in the way of general purpose data transforms for you.

And neither does quantum computers.

kruhft · on June 9, 2017

I've always wanted to implement the MMIX processor using this:

http://github.com/burtonsamograd/anser

lallysingh · on June 9, 2017

So is there any work on making an FPGA implementation of this?

Koshkin · on June 9, 2017

How do RISC architecture designers deal with the problem of low code density when modern chip technology makes RAM access considerably slower than anything that happens inside the CPU? One answer could be to have a huge instruction cache, but I doubt that it is economically possible.

0xcde4c3db · on June 9, 2017

As far as I know, the two major approaches are

1) Caches

2) Supporting a "compressed" variant of the instruction set (ARM Thumb, MIPS16, RISC-V Compressed). This can be put together by using the same field for target and destination registers, shrinking the set of named registers, making shifts and conditions into explicit instructions instead of fields (for ARM), and similar tweaks.

deepnotderp · on June 9, 2017

A number of things. The biggest one is probably compressed instruction sets, which even x86 "CISC" processors use.

For example, RISC-V and Thumb use this in their processors and they're a big boon.

Also, I think that you've been lied to about the relative code density of RISC vs CISC. CISC is only about ~50% more dense than RISC in the best cases, unless you're talking about polynomial evaluation on VAX or something crazy like that.

Symmetry · on June 9, 2017

If you're willing to be a bit impure like 64-bit ARM is you can get code density about equivalent to what you'd see in x86-64.

hvidgaard · on June 9, 2017

Anyone know the status of vol 4 and "ultimate edition" of vol 1-3?

mci · on June 9, 2017

Vol 4A was published in 2011: https://www.amazon.com/Art-Computer-Programming-Combinatoria...

Vol 4B was partially published in fascicles: https://www.amazon.com/Art-Computer-Programming-Fascicle-Pre... and https://www.amazon.com/Art-Computer-Programming-Fascicle-Sat...

Instead of the ultimate edition of vols 1-3, so far we've got a "patch" by Martin Ruckert who coordinated the volunteers: https://www.amazon.com/MMIX-Supplement-Computer-Programming-...

hvidgaard · on June 9, 2017

Thanks for the pointers.

This https://cs.stanford.edu/~uno/taocp.html seems up to date.

beefsack · on June 9, 2017

Last paragraph of the page.

> All of the MIX programs in Volumes 1--3 will need to be rewritten in MMIX, before I finish the ``ultimate'' edition of those volumes that I plan to write after Volume 5 is completed. The current target date for the ultimate volumes is the year 2020, so there is plenty of time to do the conversion. But I think it will be an instructive undertaking if different groups of students from around the world try to do the necessary translations first, perhaps in friendly competitions, long before I get into the act.

hvidgaard · on June 9, 2017

That was said 8 years ago, I hoped someone had something more recent.

davidtgoldblatt · on June 9, 2017

All the volume 4A listings are in MMIX, and there's "The MMIX Supplement" by Martin Ruckert, redoing the first three volumes in MMIX, which I believe is regarded as the authoritative edition.

As to the ultimate versions (or even finishing until volume 5): not to be morbid, but the date there has been continually pushed back, and Dr. Knuth is not a young man.

harry8 · on June 9, 2017

The issue i have with Sedgwick, cormen et al is they ignore hardware. Is Knuth really better?

How is the cache heirachy different in MMIX as compared MIX?

Jtsummers · on June 9, 2017

What hardware should they base their book on? Would it be universally applicable? Should they drop down to assembly level (as Knuth has) or stick to some higher level language?

Their choice of using a pseudocode allows the algorithms they discuss to be accurately and completely specified, while leaving the implementations up to specific platform (hardware and language) developers and users.

hackermailman · on June 9, 2017

Volume 4a there's a dozen or so mentions and examples of writing cache-friendly code. CS:APP 3/e is another book that gives general advice for writing cache friendly code. In fact CS:APP the chapters you can read side by side with TAOCP series: floating point, arithmetic operations, bit shifting, optimizations, boolean logic ect are in both books with TAOCP going much more into detail of course. I used the TAOCP chapter on bitwise ops in 4a to finish my CS:APP datalab assignment that was giving me problems.

pubby · on June 9, 2017

MMIX was such a stupid idea by an otherwise genius person. So much effort and knowledge was put into TAOCP and yet it's wasted on the vast amount of people who aren't interested in learning a fantasy assembly language.

I can't really fault Knuth for the original MIX. I mean, it _was_ written in the 1960s; a time when programming languages barely existed. But by 2009 it should be clear to everyone that C derivatives are here to stay, and that assembly programmers are dying off. Creating yet _another_ fantasy assembly language in the year 2009 was just stubborn and dumb.

brett40324 · on June 9, 2017

OP contains a lengthy section titled "Why have a machine language?"

Toward the end of Knuth's answer is:

"Therefore I will continue to use English as the high-level language in TAOCP, and I will continue to use a low-level language to indicate how machines actually compute. Readers who only want to see algorithms that are already packaged in a plug-in way, using a trendy language, should buy other people's books.

wyager · on June 9, 2017

This is dumb. There's no expressive benefit to implementing algorithms as assembly, especially imaginary assembly. "How machines actually compute" at the assembly level isn't even of interest until you need to make your algorithm go faster, at which point you need to target a real platform, not an imaginary one.

pg314 · on June 9, 2017

> There's no expressive benefit to implementing algorithms as assembly

You can't write coroutines in portable C. You can in assembly.

You can't write an efficient interpreter dispatch table in portable C (you need extensions like 'computed goto'). You can in assembly.

vanjoe · on June 9, 2017

Is this true? I don't know anything about writing interpreters, but wouldn't function pointers work?

KMag · on June 9, 2017

One call/return pair, plus the increased chance of a cache miss, makes the overhead of a function dispatch for each and every opcode rather high, unless your opcodes are very complex and do a lot of work before returning to your opcode dispatch loop.

The normal computed goto dispatch puts a

    goto opcode_table[*(++ip)];

at the end of each case statement. This essentially inlines a copy of the switch(*(++ip)) into each opcode case. This saves one branch instruction, but more importantly, it really helps the branch predictor by giving each opcode its own copy of the code for dispatching the next opcode. So, if 90% of the time your compare opcode is followed by a jump_less_than opcode, the CPU will prefetch and start speculatively executing the conditional jump before the interpreter is done with the compare opcode.

In C, if you don't use non-portable computed goto, generally the best you can do for opcode dispatch is using a big switch statement.

In theory, C compiler writers could write some pretty accurate heuristics (loop containing only a switch statement switching on an indirection indexed by a constant incriment) and detect interpreter bytecode dispatch inner loops. It wouldn't be too difficult to implement an optimization that results in the same instruction sequence as the computed goto dispatch. However, it would be too brittle for any interpreter writer to rely on the compiler performing the optimization, so most would continue to use computed goto.

haldean · on June 9, 2017

You can kind of do it with function pointers, with the caveat that a jump table of function pointers is still going to have the cost of a function call. What you can't do is something equivalent to:

    sta $1
    jmp ($1)

which just moves the PC forward by the offset stored in a register (I only know 6502 assembly, sorry everyone). There's no function call here, it's a goto where you calculate what label to go to at runtime. The closest C equivalent would be a switch/case; switch/case has the relative disadvantage that you have to make an explicit label to be targeted for each one, and it becomes difficult to express anything other than "the thing I'm jumping to is parameterized by only this one integer".

brett40324 · on June 9, 2017

I'd argue that how machines actually compute at the assembly level is very much of interest to numerous people of many backgrounds and professions.

Jtsummers · on June 9, 2017

I learned assembly with a Motorola 68k emulator. While the particular instruction set is different from the hardware I use today, is that experience useless from an educational point of view? Would it be better to have not learned it at all?

It's not a fantasy system, but it's not commonly used anymore so it's as good as fantasy.

b0rsuk · on June 9, 2017

Moreover, if I did use a high-level language, what language should it be? In the 1960s I would probably have chosen Algol W; in the 1970s, I would then have had to rewrite my books using Pascal; in the 1980s, I would surely have changed everything to C; in the 1990s, I would have had to switch to C++ and then probably to Java. In the 2000s, yet another language will no doubt be de rigueur. I cannot afford the time to rewrite my books as languages go in and out of fashion; languages aren't the point of my books, the point is rather what you can do in your favorite language. -- from the article

wyager · on June 9, 2017

Ok, cool, so use a pseudocode syntax easily understood by anyone familiar with basic control flow constructs. Most algorithms books do this, and even if they're ugly, they're certainly better than a made up assembly language.

defanor · on June 9, 2017

That pseudocode is "English as the high-level language", as I understand it. Basic and common notation is used there, it's not plain English.

Edit: though it's indeed not used extensively to denote control flow: that is done with "go to".

b0rsuk · on June 9, 2017

Most schools around the world teach you maths, even though vast majority of people doesn't use maths at work - not directly. So why bother ? Surely primary school math would be enough for everyone except specialists ?

While I'm skeptical of value of maths for programming, I can't deny that the process of learning and practicing maths teaches you valuable skills: discipline, thinking outside the box, logic, looking at things from different points of view, concentration, persistence, attention to detail...

Now Assembly programming requires many of the same skills, AND it's close to metal. It's how computers actually work. At the end of the day, all programming languages are written in asm or can be written in asm. You can't escape physics and relationships between parts of processor. It doesn't matter if a programming language is imperative or functional, or OO, the rules are the same for all. There are high level language constructs which can represent complex things in simple ways, but if you don't know how processors and algorithms work, you're playing a lottery, not optimizing a program. Having low level knowledge pays off even if using a high level language. It's good to know what's physically possible.

----------

Side note, inspired by this post... how do I post a "ASK HN" post ? I'm not 100% sure if I do this right, the last time I left the "URL" field empty but my submission looked like a normal submission, no "ASK HN" in title etc.

kwhitefoot · on June 10, 2017

> While I'm skeptical of value of XXXX for YYYY, I can't deny that the process of learning and practicing XXXX teaches you valuable skills: discipline, thinking outside the box, logic, looking at things from different points of view, concentration, persistence, attention to detail...

This has been used as a justification for teaching all sorts of things down the ages (Latin, geometry, martial arts, soldiering, etc.). Why not teach the "valuable skills" instead?

throwawayhf · on June 9, 2017

The expressiveness of "basic control flow constructs" was only formally established 1966; four years after Knuth started TAoCP and two years before the first volume would be complete; also two years before the starting salvos of the fight over whether structured programming was a good engineering decision (as opposed to merely a choice of mathematical formalism) or not.

Moreover, the whole point of the series is to discuss foundational matters. Today, for a variety of reasons, we tend to assume structured programming as an unalloyed good. But given both the physical and mathematical structure of computing it's not an obvious theorem; it has even less obvious consequences for pedagogy and engineering; and if we are to address foundational matters, we need to understand both sides of the transformation.

torrent-of-ions · on June 9, 2017

The expressive benefit is that you can do anything you can do with a real computer. Using a high level language forces you to see the world through that lens. But the book is not concerned with how you implement algorithms in C, it is concerned with how you implement algorithms on a computer.

guscost · on June 9, 2017

> until you need to make your algorithm go faster, at which point you need to target a real platform, not an imaginary one.

...also at which point you will be thinking "maybe I could have learned how to do this in a controlled environment first?"

pg314 · on June 9, 2017

If he had used C instead of MMIX assembly language, someone would have made a very similar argument: "it's wasted on the vast number of people who aren't interested in learning a low level language like C". Or, knowing HN, somebody would have called him dumb for not using Rust...

Using C, he would have run into problems by page 200 or so, where he explains how coroutines are implemented. He would have to use some assembly language there anyway.

You can complain about the choices Knuth made in writing TAOCP, or you can trust him and follow one of the great masters of computing (and of writing) along an amazing journey. Calling his choices dumb or stupid without understanding the tradeoffs he made, says more about you than Knuth.

I, for one, have no problem with MMIX. The description of MMIX and its assembly language is interesting in itself (he wrote an amazing simulator to accompany the book). Studying those 50 or so pages is only a tiny part of the effort required to study TAOCP (3000+ pages and growing).

svat · on June 9, 2017

How much is in machine language?

You didn't say it explicitly so you may not mean it, but I see variants of this idea floating around that the assembly language is somehow a major part of the books, or that you need to learn MIX/MMIX to read the books, or even that "Knuth wrote TAOCP in assembly language". The fact is that the vast majority of content in the book (even if you consider just the algorithms/programs) is in English.

As a rough measure, we can look at each volume's Appendix C—Index to Algorithms and Theorems, which lists every Algorithm, Program, Theorem, etc., and count the entries by their names:

Volume 1 (Fundamental Algorithms): Algorithm 86, Coroutine 2, Law 4, Lemma 2, Program 36, Subroutine 1, Theorem 12. Total content in assembly: 36 programs (over 652 pages)

Volume 2 (Seminumerical Algorithms): Algorithm 88, Corollary 5, Definition 14, Lemma 19, Program 18, Theorem 47. Total content in assembly: 18 programs (over 764 pages)

Volume 3 (Sorting and Searching): Algorithm 75, Corollary 2, Lemma 8, Program 33, Subroutine 1, Theorem 41. Total content in assembly: 33 programs (over 782 pages)

Volume 4A (Combinatorial Algorithms, Part 1): Algorithm 127, Corollary 12, Definition 2, Lemma 6, Program 3, Subroutine 1, Theorem 43. Total content in assembly: 3 programs (and 2 of these in Answers to Exercises) (over 883 pages).

I may have made errors in counting, but the point holds: If you don't like the assembly language just ignore it; it's a tiny part of the books. Knuth uses the programs when he wants to highlight some low-level aspects. ("Expressing basic methods like algorithms for sorting and searching in machine language makes it possible to carry out meaningful studies of the effects of cache and RAM size and other hardware characteristics (memory speed, pipelining, multiple issue, lookaside buffers, the size of cache blocks, etc.) when comparing different schemes.") This is not often, so most of the value of the books is got regardless of whether you read those assembly-language bits or not.

And in fact, as the chapters progress towards higher-level stuff (see the TAOCP pre-fascicles Knuth has been publishing in recent years: http://www.cs.utsa.edu/~wagner/knuth/), the proportion of MMIX is even less.

sspiff · on June 9, 2017

MMIX was published in 1999, a time when it wasn't decided that x86, and later AMD64 and ARMv7/8 would dominate all platforms.

Back then, Alpha had only recently been killed, and things Itanium were coming up, Apple sold PowerPC computers and Windows devices running ARM, MIPS and SuperH were being widely sold.

In the spirit of 1999, a fantasy architecture was not a bad way to gloss over that fragmentation.

Also, x86 is not an elegant teaching platform (though I hear AMD64 is much better).

ianai · on June 9, 2017

That's like, just your opinion, man. There's always going to be that lowest level before 1s and 0s. (I say this as I'm curious if this assumption of mine is actually out of date at this point.)

AstralStorm · on June 9, 2017

Not really, and the hardware is getting more complex all the time. A modern CPU is like a probabilistic OS nowadays...

wyager · on June 9, 2017

On many modern CPUs, the assembly language that compilers use and subsequent generated machine code isn't even the lowest level representation used by the CPU. Many desktop-class processors rewrite incoming instructions into a simpler microcode instruction set more easily handled by the OOOE hardware.

sethrin · on June 9, 2017

Thank you for provoking a vigorous debate on this subject. It's clear that many people disagree with you, which hopefully should not be a good reason for downvoting. I had mentioned two days ago at a meetup that I had been intending to make a serious attempt at Knuth in the near future, and the person to which I made that remark vociferously asserted the uselessness of Knuth for that very reason. I was completely unprepared to reply, and the discussion here is proving extremely useful.

The overall tone of your comment was probably a mistake unless you were being deliberately provocative to take advantage of Cunningham's Law.

mhh__ · on June 9, 2017

Come on, it's not exactly difficult to read? If Knuth used loads of rare x64 instructions then I'd accept the premise, but this is(http://mmix.cs.hm.edu/doc/instructions-en.html) is practically simple.

torrent-of-ions · on June 9, 2017

First of all, maybe you should try learning a fantasy assembly language. It's not as hard as you seem to think. You know how driving many different cars makes you a better driver? It's the same with computers. The best programmers have programmed many different computers.

A high level language forces you to see the world through its lens. Just look how differently an algorithm would be implemented in C compared to Lisp, for example. One uses a high level language because it makes things you do often much more convenient. The great benefit of C is being able to pointer arithmetic in an extremely convenient way. Lisp is more about programming in the way your brain likes to actually think, which makes it quite pleasurable. But unfortunately neither is how any computer actually works.

So instead of choosing some arbitrary fashionable lens, Knuth chooses no lens, but instead invents a computer that is good for teaching, rather than being good for manufacturing and selling.