Hacker News new | past | comments | ask | show | jobs | submit login
Video Chess disassembled and commented (nanochess.org)
260 points by nanochess on June 22, 2023 | hide | past | favorite | 69 comments



Many years back (so details are fuzzy), a friend who worked at Atari in the early 80's told that the programmer who did the game part of the work had the basic chess algorithm working in a few months, spent the next year getting it to run in less than 128 bytes of RAM, and was then burnt out and left tech.

IIRC, the game devs used a PDP-11 for much of their work, so that had a bit more RAM to work with to get started.


I'd love to see a dev setup for using the PDP-11 and the Atari 2600 for doing gamedev. I did a quick search on youtube and the search engines and couldn't find anything.


I'd love to see a dev setup for using the PDP-11 and the Atari 2600 for doing gamedev. I did a quick search on youtube and the search engines and couldn't find anything.

The photographs I've seen aren't all that interesting.

A VT-52 terminal and a hand-built cart wired into an RS-232 port, all on a long desk.

Hopefully there are more exciting photographs out there that I haven't seen.


Have a look at the PDF document here: https://forums.atariage.com/topic/259003-intellivision-devel...

Also check out the links provided at the end of the document.

Besides the PDP-11, I'm glad to see the beloved PDP-10 was also used by game developers back in the days.


Be see, I mean a demo of what it is like to dev Atari 2600 games using a PDP-11. I mean there are PDP-11 emulators, we could setup a pure virtual Atari dev system.


We had a PDP 11/70 for undergraduate work when I started studying for my BSCS in 1984. Nothing pretty, just glass vt52 and vt100 type terminals. Oh, and a 110 baud line printer that could also be used as a terminal. That line printer was not fun, but I had to use it on occasion when all the good terminals were taken.

IIRC, ours had 64KB of RAM in two banks.


The 128 bytes of RAM seems quite easy (game state is 32 bytes for chessboard + 1 byte for side-to-move+castling+en-passant plus 2/3 bytes per move in the search stack + 1 byte to record the depth at which castling was disrupted = 54/64 bytes with depth 10, leaving a luxurious 64/74 bytes for call stack and local variables).

The problem seems to be more the 4KB of ROM, but only if one is trying to make it actually somewhat good, and also perhaps using 1979 tech for programming.


But the computer also needs to keep a scratch space to consider potential moves it could make, which if done in a straightforward manner would exhaust the other 64 bytes of memory.


It could just mark branches at the cost of recomputing moves repeatedly to evaluate scores.


I started developing software as a kid on a Commodore 64, and I thought we were pretty clever if we were able to accomplish interesting things using only 32KB of RAM. But the idea that somebody was able to develop a full chess engine using only 128 Bytes of RAM boggles my mind.


Imagine how easy it would be to tell people “Hey sorry we can’t include that feature, there’s not enough RAM” and then you could just build what you wanted to, exactly.


Think of the 128 bytes RAM to be only used for your dynamic gameplay state variables.

C64 (and other home computer) games had to load all code and data into RAM, which usually accounts for most RAM usage. Game consoles like the Atari 2600 stored the code and static data in a ROM cartridge.

For simple arcade-style games (like Pacman), 128 bytes of RAM is more then enough to manage the gameplay state.

(still of course: a chess engine in 4 KByte ROM and 128 bytes RAM is pretty damn impressive)


Fitting a chess game state in 128 bytes is for sure doable; what is really impressive is being able to use the same memory not just for the current position, but also to compute the next moves in a decent way.


I wonder if it "looks ahead" anything at all, or if it's just evaluating the current turn's piece trade potential. If it does look ahead, I wonder how many bytes per turn of lookahead is needed.


By the way: right now I'm working on a chess training project(1), and I'm thinking that a marketing cofounder could really help. If anybody is interested, my contact info is in my profile.

(1) https://chess.braimax.com


I tried it out, was enjoyable to play a few puzzles. My feedback is a) that the ring sound (when you win a puzzle) was too loud, so I ended up muting the tab, and b) that the logo looks like "BraiMax", and doesn't stylistically fit with the rest of the site theme.


Thank you for your feedback! By the way, you can disable sound from the little cog icon just above the chessboard.


I'd recommend changing the sounds up a bit. The piece moves sound more like doors locking or unlocking from FPS games in the 90's. And the bell... just doesn't sound "successful chess puzzle" to me.

I'm no UX engineer at all, but the sounds just do not sound chess-y. Just a thought.


Looks neat.

Better call the puzzle training something else than Puzzle Rush though.


> Looks neat.

Thank you.

> Better call the puzzle training something else than Puzzle Rush though.

Why? I think it's the most recognised name for the sequence of puzzles becoming harder and harder, even if I removed the time limit (as that leads to not learning much :)


It’s trademarked.


I think I’m pretty good at writing software, and then I see stuff like this, and the impostor syndrome kicks in hard. A fully working chess game in 4K of ROM, using each of the 128 bytes of RAM for several purposes? That’s some genius.

Bravo to OP for the disassembly and explanation, too!


If you only had 4K of ROM and 128 bytes of RAM you’d find a way. That’s what we all did back then. This isn’t to diminish Atari’s achievement (this stuff was super hard and also super satisfying), it’s just to say that doing these kinds of tricks was what you did when you wanted to create something state of the art with few resources.

Here’s a silly example: https://blog.jgc.org/2022/11/primality-testing-on-commodore-...


That's perpetually true. "You're trying to do machine learning on a desktop PC?" "Hmmm, I wonder if we could use the GPU as a math engine?"

But holy cow, that chess game really pushed the limits!

(A side note I like to bring up occasionally: my first "computer" was "Atari BASIC" on my 2600. It had 63 bytes of RAM, I think. That didn't last long until my parents brought home a TS1000 with a luxurious 1KB of RAM.)


Yes. I was thinking about that after my comment. All the folks doing machine learning optimizations, weird stuff with AVX2, etc. are the same folks who would have been stuffing chess in 4K of ROM and 128 bytes of RAM.

In 1985, me and another boy from school wrote a whole reliable network transport and stuffed it into unused memory on a Research Machines 480Z. We had 128 bytes for the receive buffer so in order to do anything we'd literally ship a packet across the network containing machine code and the stub code would CALL the receive buffer to perform tasks. Executable packets!


I could listen to stories like this all day long.



How could you do this to me on a work day?


Call in sick. I faked a stomach ache when my parents got me Apple //c


Shhh! Don't give away our secrets.


> But holy how, that chess game really pushed the limits!

Having 4 kilobytes of ROM made it relatively easy, I think. If one ignores draws by repeated positions (which I think all small chess programs do), board state can easily be stored in 67 bytes or so (64 for the board proper plus four bits for whether various castlings are allowed, plus a byte indicating the square for en passant opportunity, plus a byte for counting for the 50-move rule), leaving 61 for the search algorithm.

I think https://en.wikipedia.org/wiki/1K_ZX_Chess, with 672 bytes of code and therefore 352 bytes of RAM for storing game state _and_ screen data might be more impressive.

I would have to see at what level both programs play to be sure, though.


Each square of the board doesn't even need a byte, only a nibble. There are 6 piece types so that fits in 3 bits, plus one for color. If you pack two of those nibbles per byte then you can do it in 32 bytes plus the castling and en passant bits.

Video Chess uses the lower nibble of 64 bytes for the board, and the upper bits of those for various other purposes.

You actually could condense further than that. Build the storage as an array of pieces rather than squares, so 32 storage locations. Each of those locations needs only 6 bits to indicate that piece's current square. (And even only 5 bits per bishop, although pawns need three more bits each to handle promotion and to which piece type.) Of course, writing an engine for that data structure is much more cumbersome, since it can't look at a square to find what piece is on it, instead it has to iterate over all the pieces to find out what's on a square.


> writing an engine for that data structure is much more cumbersome, since it can't look at a square to find what piece is on it, instead it has to iterate over all the pieces to find out what's on a square.

Ignoring the move generation, this is also way harder to render the output with. The Atari 2600 has very few cpu cyles per line, and I don't know if you'd have enough time to iterate through all the pieces to see if and where they are on the current line, even given that several video lines are spent on each chess rank.


Yeah, of course. I wasn't speaking of the Atari or any computer in any practical sense, just hypothetically about the minimum storage.

Of course it's possible to go even more than that - enumerate every reachable position (so remove situations like a pawn on the first or eighth rank, kings adjacent or in some check situation that couldn't have happened, pawns in positions that would require more captures than there are enemy pieces missing from the board, etc) and assign them all an index number, which requires log N storage size. And of course it would be practically impossible to do anything computationally with positions stored like that.

This link estimates an upper bound of around 10^46 reachable positions, so log2 of that would be only about 150 bits to enumerate them all.

https://math.stackexchange.com/questions/1406919/how-many-le...


You won't. You have 76 CPU cycles per line. Searching through a 32 byte array for a match - 4-5 cycles just to load the byte from the array if you use zero page indirect addressing.


1K ZX chess actually used the screen data as the game state.

As a neat side effect, this means the user can see all the moves that the engine is considering.


> That's what we all did back then.

I think you might be "suffering" from the Dunning-Kruger Effect. Even if I accept that "all" the coders at that time were able to do this, that would just mean that the pool of people who were capable coders then is vastly more skilled than the typical coder these days.


Anyone who wanted to push the limits of hardware at that time had to do tricks like this. There just wasn't any other choice. Sure, there were plenty of programmers back then who weren't capable. But it's not magic, it's careful thinking and hard work.


Well, for some perspective tricks of the kind being discussed here were frequently published in magazines found in the local grocery store! Getting an extra sprite on a horizontal line, or putting more colors on the screen, various math tricks, how to pack multiple flags into a byte, and many other discussions were common.

When others say, "we had to", they are in part, communicating how thinking was done. The lower level details were often published down to the bits in registers and doing even simple things took one into tricks land more often than we may realize today.

Now that said, the VCS (2600) is a difficult machine. But, it was not above an average programmer to be successful as much as it did demand one go through the careful steps and thinking as told today. In fact, just doing what the machine was designed for; namely, "Combat" required that thinking.

I think you may be getting hung up on scope and scale. What I mean is the field was not as broad in some ways, and most everyone was a lot closer to the hardware than we mostly are today too.

Kids were jumping onto micros and literally figuring it out with the info they could glean from one another, manuals shipped with the machines (maybe), magazines and the like were the basic communications happening daily. I was one of those kids and in my peer group it was expected to understand the hardware well enough to follow code.

So yeah, overall it was just different in ways that may well prove confusing today, depending on what any of us may have chosen as our start point.


Keep in mind is these kind of things have a flat memory space between RAM and ROM. It's not like today where disk is some thing way over there that you need to buffer in RAM. All of your code and resources are in ROM which means they're just an address away. Your RAM is scratch space for dynamic values but you can pull everything else from ROM.

For instance your bitmaps for pieces/spaces are stored in ROM, the code called when a scanline is drawn goes and accessed the values in ROM (it's just a memory address) without needing it to ever living in RAM. The only part that needs to live in RAM is the board state that the drawing code uses to find the bitmaps to draw. There's no need to draw to an off-screen buffer in RAM that then needs to be drawn to the screen.

None of that is to say 128 bytes of RAM is somehow easy but the code and data in ROM can do a lot of heavy lifting without taking much RAM.


    Keep in mind is these kind of things have a flat 
    memory space between RAM and ROM
This is true and is very central to how these miracles were pulled off.

While you know this, for the benefit of those reading: you could also use ROM for time/space tradeoff purposes -- for example instead of doing a bunch of trig calculations in real time for character movement purposes you could just have lookup tables in ROM.

    the code called when a scanline is drawn goes and 
    accessed the values in ROM (it's just a memory address) 
    without needing it to ever living in RAM
This went away at some point in console history, and I'm not sure exactly when.

For example the Sega Genesis/Megadrive has a scanline based video chip. No framebuffer, etc. But you've got to copy graphics data from ROM into the 64KB of video RAM before it can be drawn onscreen -- no rendering directly from ROM.

You can do this during the VBLANK interval (the time between frames) but there is not enough time to rewrite all 64KB so you've got to be crafty. Typical procedure was to stream animation data for the main character into VBLANK because you would like your main character to be animated as smoothly as possible. [1]

Contemporary systems like the Neo-Geo did not have that restriction and could render ROM data directly,[2] although it was several times more expensive than the Genesis and had many more address lines physically exposed on the ROM cartridge connector. May have used faster ROM chips too; dunno.

[1] https://rasterscroll.com/mdgraphics/graphical-effects/animat...

[2] https://wiki.neogeodev.org/index.php?title=VRAM


Addendum:

    Contemporary systems like the Neo-Geo did not have that 
    restriction and could render ROM data directly
...which was obviously advantageous for typical use cases! However, it came at a cost in flexibility. The NeoGeo could draw from ROM, but could not draw from video RAM.

Games on the Genesis and SNES could therefore use video RAM as sort of a makeshift kludgy framebuffer for polygonal games or for fancier raster effects. The Neo-Geo could do no such thing.

The Neo-Geo might have been able to given some custom hardware on the cartridge itself, with a cartridge-based CPU rendering data to RAM on the cart that was seamlessly mapped into the cartridge's address space. I have seen some wacky emulator projects that work this way, like DOOM on NES: https://hackaday.com/2019/06/06/doom-on-the-nes/


Wait, the Genesis, which came out in '88, had to race the beam like an old Atari?


The races are quite a bit different. On an Atari, pretty much all games spend the screen on time feeding the TIA chip; with most or all of the game logic happening during vblank.

On a Genesis, games spend most of vblank feeding the VDP chip with sprite and other graphics data, and then can run their game logic during the screen on time. With some effects requiring frobbing the VDP chip during the drawing phase.

However, the genesis and contemperaries still don't have a frame buffer, so their graphics chips are processing background tiles and sprites pretty much just in time. I see some notes that the Genesis VDP does some of this work earlier than most others, so something like changing sprite position data in vram won't be effective until the next frame, whereas contemporaries can reuse a sprite on the top and bottom of a frame by repositioning it in the middle. Otoh, the Atari 2600 doesn't really have sprites, you have to program the TIA with patterns each line.


"Frobbing" -- added to my goofy nerd lexicon


It's beam-chasing, BUT in a much more powerful and programmer-friendly (declarative, really) tile based way. All of the 8- and 16-bit home and arcade systems essentially worked this way.

For the Genesis you've got two independent background layers. They are made of tiles. You also have 80 sprites which are also tiles.

For each sprite, you give the system an X,Y position and a pointer to VRAM for the graphics data.

For each background layer, you give it a scroll offset and a list of pointers to VRAM for the graphics data. Very similar to sprites, but you move the whole layer at once so each tile doesn't get its own X,Y.

You don't have to do this on every frame. It will just keep drawing the same thing over and over until you change something. You don't have to do steps 1 through 3 on every single frame - you just make the changes you need. If the game is paused or the player hasn't moved, for example, you might not do anything at all.

You have a limited amount of time during VBLANK (the time between frames) to do graphics work. This is where you update sprite/background positions, transfer data, change palettes, whatever.

You have an even more limited amount of time during HBLANK (the time between individual lines in a frame) to do things as well. A common use for this is updating the background scroll position in the middle of the frame. This is how you get "line scrolling" effects like the cool pseudo-3D floor effect in Street Fighter II. You can also change palette colors. This is how you get the transparent water effect in Sonic the Hedgehog even through the Genesis doesn't have true alpha transparency - you change the palette part of the way down the screen during VBLANK.


> All of your code and resources are in ROM which means they're just an address away.

Only until you need more rom than address space available to the cartridge slot, and you need to add bank switching. Then you've got to keep track of what bank resources are in.


A fully working chess game in 4K of ROM, using each of the 128 bytes of RAM

MicroChess says hold my 924 byte beer: http://www.benlo.com/microchess/index.html

MicroChess is also believed to be the first home computer software that was available for sale to the public.

I have this on my KIM-1, and I find watching it play itself kind of soothing.


I'm featured on that page, for porting it to C for running on modern PCs. But by far my favourite retro chess software project is Sargon 1978 https://github.com/billforsternz/retro-sargon


I've worked through all the 2600 programming books, including the authors. I've yet to start on anything of my own worth while. At one point I thought maybe I could start reverse engineering some games for some inspiration. Didn't get more than a few hours into it before I decided I'd just rather watch other people do great things with the 2600.


You may want to give Batari Basic a look.

In the early 00's one enthusiast had the idea to make a simple BASIC compiler for the 2600. I was one of the early testers and the first thing I did was make a functional "BREAKOUT", bounce the ball off the bat to eliminate bricks in a wall game.

It took a couple hours and was, I think, the first (and buggy) playable program authored by someone other than the language author. And it was fun!

What the author did was kind of brilliant in that he packaged up several of the common tricks used to make games and presented them along with a simple BASIC.

Variables, for example, were just bytes and or individual bits, essentially addressing bits in a byte like an array.

     J = %01101101
     If J[2] = 0 then X = X +1
Here is a sample program to move a sprite around on the screen:

    x=50
    y=50

   main2

    COLUP0=28
    COLUBK=02

   player0:
     %00011100
     %00011000
     %00011000
     %00100000
     %01011010
     %01111100
     %00100100
     %00010000
     %00011000
     %00111100
     %00011000
    end

     player0x=x
     player0y=y

     drawscreen

     if joy0right then x=x+1
     if joy0left then x=x-1
     if joy0up then y=y-1
     if joy0down then y=y+1

     goto main2
On the VCS, a game is a loop that renders the display and in-between display frames, during VBLANK, one computes the next game state to be displayed, and it all runs at 50 or 60Hz, locked to the display refresh rate.

The BASIC exposes the little bit of hardware to a programmer in a simple way that I found to be a lot of fun because the really hard parts of assembly language are not necessary, but can be written in-line just as easily as defining a sprite is:

    asm
     LDA J
     ASL
     ASL
     .
     .
     .
    end
Any hoo, it was fun and surprisingly productive. I found I could knock out simple concepts in an hour or two.


I just now recalled the author of Batari Basic, and he is Fred Quimby. That bothered me, and now it does not.

Fred did a lot of great work along with some others to create both Batari Basic and the Harmony cartridge, both of which mean mere mortals can have a little fun programming for the 2600.


I've considered doing something in this vein, and my thought is don't try to learn two things at once. Game design and game implementation are two different things; better to clone a game for your first several implementations, IMHO. OTOH, I'm not motivated enough to do that, either. :D


Also if you have some acquired 6502 skills consider doing some c64 and appleii or even atari x00 projects. Then maybe go back to 2600. All are very constrained in their own beautiful ways but not as perverse as the 2600 :)


99% of people "writing software" could not write anything worth calling a chess engine by themselves.


Well, there's also this https://en.wikipedia.org/wiki/1K_ZX_Chess

1K ZX Chess's code takes up only 672 bytes in memory,[2] but implements chess rules except for castling, promotion, and en passant, including a computer opponent.[3] It was the smallest implementation of chess on any computer at the time.

672 bytes RAM, No ROM. No promotion is a killer though


The podcast advent of computing just had a deep dive into the intricacies and limitations of the Atari 2600. Having just listened to that episode, it makes this achievement even more impressive.

https://overcast.fm/+Ri2NFzoG8


I highly recommend this podcast if you have any interest in our computing history


Now get it hooked up to lichess and see where its rating ends up.

I love the 2600 breakdowns and the hacks to get things working.


I just finished training a neural network to play a trick taking card game and just the neural network is 4,250 times the size of the ROM for Video Chess and the executable including the UI is even bigger.

True story: One time I was at a party and my parents introduced me to some friends of theirs and they said their son worked on graphics drivers. I said I couldn't handle low-level work like that. My parents glared at me and told me I was being rude. Once I figured out the disconnect I explained that low-level means closer to the CPU and it's much more complicated - not that it's less desirable work.


Smallest chess engine in 288 bytes: https://leanchess.github.io/


You can play this game here free80sarcade.com/atari2600_VideoChess.php


I have Oscar's 'Programming Boot Sector Games' and I worked through it last summer. Great fun! And having programmed from 1978 on a Commodore PET 2001 in PET Basic and later on Vic-20 and Amigas in assembler and C, I learned a few new tricks from Oscar's book. Constraints hone your skills, so when you are buried in large frameworks and IDEs you can take some solace that not everything needs to be so complicated. It's also why I enjoy APL/J/BQN and April (APL Re-Imagined in Lisp).


don't settle for chess on the Atari 2600 when you can be playing Myst on the Atari 2600: http://deater.net/weave/vmwprod/myst_vcs/

I'm trying to get it to the speed-runnable state before Mysterium next week, we'll see how that goes


What's its ELO score?

Edit: Really? Downvotes? Certainly it must have one. You could play someone on chess.com and find out?


This seems to have an estimate: https://chess.stackexchange.com/questions/24952/how-strong-i...

I was wondering the same thing. It is interesting to see the exact feature in past engines with limited resources that overcomes the threshold and moves an engine from the bottom to the higher ranks, and how historically what ideas were in place and which were culled that moved the engines in that direction.


Oscar Toledo is awesome. I have his book Programming Boot Sector Games, and it's wonderful.


This is off topic (I owned an Atari 2600, and loved Atari, but never owned Video Chess), but I've come up with a chess variant I call "Transposition Chess" (not to be confused with Transcendental Chess).

The rules are the same with three major differences.

I) Black has the option of transposing any two pieces on their back rank before play begins. If they do they must transpose the parallel pieces on their other side to make their setup symmetrical (unless they transpose the king and queen). White must adjust their setup to mirror Black's, and then play can commence.

This creates 11 possible setups:

1)b>RNBQKBNR w>RNBQKBNR

2)b>RNBKQBNR w>RNBKQBNR

3)b>RBNQKNBR w>RBNQKNBR

4)b>BNRQKRNB w>BNRQKRNB

5)b>NRBQKBRN w>NRBQKBRN

6)b>RNQBBKNR w>RNQBBKNR

7)b>RNKBBQNR w>RNKBBQNR

8)b>RQBNNBKR w>RQBNNBKR

9)b>RKBNNBQR w>RKBNNBQR

10)b>QNBRRBNK w>QNBRRBNK

11)b>KNBRRBNQ w>KNBRRBNQ

Castleling is only possible in the first five set-ups, and will always be with the piece that occupies the Rook's original square.

II) Pieces can be demoted by moving them to an original pawn square (on one's own side of the board) and declaring a demotion. However, there must be at least one of their pawns captured that they can "exchange" it for. Note that the new pawn could deliver checkmate/stalemate.

III) Pawns can only be promoted to pieces currently captured. If no pieces are currently captured, the pawn may not move to the promotion rank (as if a piece was blocking it). Note that it may still deliver checkmate/stalemate in this position, as normal.

The thinking behind the first rule is to make book opening memorization less of an issue (without making things too complicated or unusual) as well as giving Black compensation for moving second. The thinking behind demotions is to make games that would normally be draws due to lack of material (or skill) winning. The thinking behind the last rule is to make over-the-board play easier (e.g. no confusion about whether a piece is a pawn or queen), though it also potentially makes "two for one" exchanges more attractive for the person capturing two pieces (I recently saw somewhere that there's some evidence that when the queen is exchanged for two rooks or the rook for two minor pieces, the person losing two pieces typically does better, though nominally they have less in material value).


What is an “alone white king”?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: