Hacker News new | past | comments | ask | show | jobs | submit login
How to fit a large program into a small machine (1980) (mud.co.uk)
125 points by peter_d_sherman on Oct 13, 2023 | hide | past | favorite | 39 comments



I've always been fascinated by Zork despite coming of age decades after its release. Reminded of it a few years back through The Digital Antiquarian, I wrote a C# implementation of the Z-Machine (https://github.com/raegwolf/zmachine). Seeing how so much could be achieved with so little memory was eye opening.


I wrote one in Haskell [1], ostensibly as a uni class project but also as a way to teach myself Haskell and have fun; I had the infrastructure in place and was trying to run Zork, implementing opcodes, one by one, when finally I saw the familiar message "You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here." The sense of accomplishment was rewarding and elating.

2004. Good times.

[1]: https://github.com/nathell/haze


I love Zork because I started learning English on it (forest being one of the first words, never could find "stiletto" in any dictionary I had!). At the same time we were all pirating punks in middle school who loved skipping copy protection or boosting stats in assembler, and Zork was one of the only games who completely baffled me, which I now know is because of the interpreter. Good times.


It's mentioned briefly, but another big advantage of the Z-machine was being mostly hardware-independent, which was especially important with the wide variety of computers in the early 80's. The Digital Antiquarian says that Infocom supported over 20 different platforms at its height. [1]

[1] https://www.filfre.net/2013/03/the-top-of-its-game/


There's a lot of detail on the Digital Antiquarian site, scattered about, but this piece is a good standalone introduction, complete with some sample code:

https://www.filfre.net/2012/01/zil-and-the-z-machine/


the answer is to encode your program using a higher-level machine code targeting a portable virtual machine.

that’s also how they managed to fit all the software of the Apollo guidance computer into it.

> “The AGC had a sophisticated software interpreter, developed by the MIT Instrumentation Laboratory, that implemented a virtual machine with more complex and capable pseudo-instructions than the native AGC.” (https://en.wikipedia.org/wiki/Apollo_Guidance_Computer#softw...)


See also WOZ's SWEET16 for the Apple ][

https://en.wikipedia.org/wiki/SWEET16


You probably know this, but to be clear for anyone else reading - the AGC's code was written in a mix of interpreted code and native code. Writing in native AGC assembly was necessary for the most performance-sensitive parts of the flight, such as during the LM's landing.


There was also A-code from British company Level 9: https://en.m.wikipedia.org/wiki/Level_9_Computing#A-code


My dad bought an Amstrad CPC 464 for work in late 1985, and I was allowed to use it occasionally. I bought the Amstrad Action issue 4 Christmas special (now famous for having the first ever cover mounted computer cassette) containing two pretty simple games - Yie Ar Kung Fu and Number 1. They were pretty fun, but despite all the adverts for other games, the very first game I bought based on all the adverts and reviews was... Colossal Adventure. It's mentioned in the article you linked, but astounding that so much game could be packed into 48KB and loaded all at once into memory from cassette, compared to Infocom's paged from disk games.


I was surprised to find that a VM was used in the Busicom calculator that was the motivation for and first use of the Intel 4004 microprocessor.

Wrote about it here:

https://thechipletter.substack.com/p/bytecode-and-the-busico...


Thanks, that was an incredible story and much more detail than I had seen elsewhere. The whole series is a goldmine. Eg. I never knew that the registers in the 4004 were dynamic (ie. a cap + 1 transistor) [1]!

[1] https://thechipletter.substack.com/p/the-strangeness-of-the-...


Thanks so much! It's a really interesting design.


I'm pretty sure the year for this article should be 1980, not 1999. According to Jimmy Maher, this was an article that was written for Byte magazine around the time of Zork's release.


Not Byte but Creative Computing, July 1980

https://archive.org/stream/creativecomputing-1980-07/Creativ...


I definitely remember this in Byte in 1980 also, or at least an article by Zork authors on the same subject.



If you want to create a Z-Machine game, get Inform6 and Inform6 lib among the Inform Beginner's Guide to learn how to create a basic adventure:

git clone https://jxself.org/git/inform

git clone https://jxself.org/git/informlib

cd inform ; cc -o inform src/*.c

sudo cp inform /usr/local/bin ; cd ..

Spiritwrak:

git clone --recursive https://jxself.org/git/spiritwrak

cd spiritwrak; sh build.sh ; cd ..

Snowed In (incomplete):

git clone --recursive https://jxself.org/git/snowed-in

cd snowed-in; sh build.sh; cd ..

Homeland: (complete, small)

git clone --recursive https://jxself.org/git/homeland

cd homeland; sh build.sh ; cd ..

In order to play the z5/z8 games, you'll need Frotz/Fizmo or any compatible interpreter.

The games at jxself's repo are GPL licensed, so, please, comply with the license if you create derivated games.


From the Z spec (https://www.inform-fiction.org/zmachine/standards/z1point1/s...):

"In practice the text compression factor is not really very good: for instance, 155000 characters of text squashes into 99000 bytes. (Text usually accounts for about 75% of a story file.) Encoding does at least encrypt the text so that casual browsers can't read it. Well-chosen abbreviations will reduce total story file size by 10% or so."

I'm curious about other schemes like this with higher density. Obviously, it's a trade-off between text density and more code in the VM.


Back in the eighties people tried to squash big games into microcomputers. I don't recall any similar technology to Z-code but I do remember that games like Citadel and Elite on the BBC micro used certain techniques to compress the games into the 32K available on the model B: https://en.wikipedia.org/wiki/Citadel_(video_game) https://en.wikipedia.org/wiki/Elite_(video_game) https://www.bbcelite.com/deep_dives/program_flow_of_the_main...


Z machine interpreter in PostScript (from a 2019 Usenet discussion): https://groups.google.com/g/rec.arts.int-fiction/c/HNtntxCpc...


The "Linux on an 8 bit micro" experiment came to mind for me:

https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...

2 hours to boot to a shell :)



Also related:

How to Fit a Large Program into a Small Machine (1999) - https://news.ycombinator.com/item?id=17839651 - Aug 2018 (35 comments)

How to Fit a Large Program Into a Small Machine - https://news.ycombinator.com/item?id=1032408 - Jan 2010 (3 comments)


An instance of compiling the program to a bespoke bytecode and then interpreting the bytecode. This is one of my favourite programming tricks - high effort and high effectiveness, rarely used.

Dwarf does this as one of the size optimisation tricks. I used to know another example but it isn't coming to mind.


Too bad ROM was expensive those days. If a language with a nice balance between expressiveness and performance (say Lisp, Forth, Pascal, maybe Lua or even Java) had been embedded in the machine's ROM, then many programs could contain just their core functionality. While having all the machine's capabilities easily accessible (and maybe some standard library of often-used functions).

As opposed to containing implementations of various interpreters, VMs etc, as described here (and functions that 100s of other programs implemented as well).

Or even better: a selection of languages, letting programmers pick or even mix & match for various parts of the program.

Even today "big ROM, small RAM" is common (eg. in uC's). With flash offering extra flexibility for the 'ROM' part.

But the limitations of the day made many (system) ROMs little more than a glorified boatloader + some low level I/O code. If you were lucky, with an easy to learn but slow language like (interpreted!) BASIC.


There's an interesting design point in high wattage processors. If your bytecode interpreter fits comfortably in the L1 cache, and the bytecode is denser than machine code, and the program doesn't fit in the caches, then there's potential for interpreted bytecode to outperform native code. Quirk of really fast compute paired with moderately fast memory. I can see that working particularly well if one interpreter instance executes code for multiple processes.


I'd actually argue that ROM is still expensive unless you are making a lot of units, and it's been pretty much replaced everywhere by flash which can be commoditised leading to better economies of scale as well as having the advantage that your "ROM" can be upgraded without wasting a lot of old stock.

Where hardcoded ROMs do exist in designs now, it's usually still as a glorified stage 1 bootloader, albeit now usually with signature checking as well.


Does anyone know of any good books which describe the design and implementation of small VMs/interpreters like these?


I think it isn't a constrained enough topic to write a textbook on. The conclusion section of the article under discussion summarizes the technique pretty well.

I have encountered brief discussions of the technique in a few places though. Jon Bentley wrote a column called Programming Pearls, collected in a couple of compilation books, and one of those was called Squeezing Space which talked about this a bit. I think that Peter Norvig had a chapter on it in Paradigms of Artificial Intelligence Programming but couldn't swear to it. I've also read that KDB+ uses this technique, and I seem to remember that one of the few columns that discussed implementation (it's proprietary technology, so there's not much on it) talked about stuff like this.

The general idea was also briefly discussed in Dynamic Languages Wizards Series - Panel on Runtime (from 2001), though they didn't go into methods, just trade-offs [1]

[1] https://www.youtube.com/watch?v=4LG-RtcSYUQ


What's in PAIP is a bytecode compiler for Scheme with some discussion of simple optimizations, with a ref to a 70s or 80s paper by Deutsch (iirc) on design of a Lisp VM for a space-constrained computer.

Another paper that comes to mind is Anton Ertl's on his vmgen system. The Smalltalk-80 bytecode design is worth a look too.


Niklas Wirth has an example of a small RISC VM and interpreter in chapter 9 of his book on compiler construction.

https://people.inf.ethz.ch/wirth/CompilerConstruction/index....


The only one I know is this one here: https://craftinginterpreters.com/

It uses a simple stack machine. I am on the market for a book on small register based interpreters.


Niklas Wirth wrote a nice concise book on compiler construction. He’s targeting his own register-based RISC architecture.

https://people.inf.ethz.ch/wirth/CompilerConstruction/index....


I don't know of anything concrete, but looking for information about Pascal's p-code approach might be useful; given its historic significance and relative prominence, there's probably some decent information about it.


To continue, please insert disk 21 of 22.


Error reading drive A. Abort, Retry, Fail?


Floppies were fairly reliable. Storage using audio tape, that varied a lot.


Infocom games were always amazing to me. Great writeup!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: