Although I'm also not very familiar with the history of such schemes, I do recognize the (not quite) 12-bit one, which I believe was the first scheme proposed, by Claude Shannon in his 1950 article "Programming a Computer for Playing Chess" [1]. Perhaps because it's naturally the first scheme anyone would come up with, but nonetheless, here's what he had to say about it:
> A move (apart from castling and pawn promotion) can be specified by giving the original and final squares occupied by the moved piece. each of these squares is a choice from 64, thus 6 binary digits each is sufficient, a total of 12 for the move. Thus the initial move P-K4 would be represented
by 1, 4; 3, 4. To represent pawn promotion on a set of three binary digits can be added specifying the pieces that the pawn becomes. Castling is described by the king move (this being the only way the king can move two squares). Thus, a move is represented by (a, b, c) where a and b are squares and c specifies a piece in case of promotion.
I'm actually slightly surprised Shannon didn't propose a more compact scheme using the lower entropy of legal chess moves, but I guess his purpose in this article was more to do a ballpark estimate of the feasibility of computer chess playing in general.
Very cool. I suspect that much better compression is possible in principle (not that I'd want to implement it) using an openings book or game database and an engine. The idea would be to first record the opening played in the game and the move number at which the game deviates. A lot of work would need to go into figuring out the optimal opening-book size. Then, use a deterministic chess engine with predefined parameters at each move, and record which move number on its suggested list was played (e.g. the top move, second move, third move, etc) with a fallback to manually encode the move if none of the top 8 or so moves are played.
A more sophisticated version would use arithmetic coding, with the predictions of the next move initially coming from an opening book / game database, then coming from the engine. The idea being that most games you want to compress are at a high enough level that the engine gives good predictions ... perhaps one could even tune the engine's parameters for better results. But again, like I said, it doesn't sound like fun to code.
A separate comment: I wonder if the time efficiency issues mentioned are really that severe? Since the problem is so small/finite.
Article author here: I do mention the idea of using a chess engine to improve compression. I propose a simple scheme and calculate/estimate my scheme would compress moves in a reasonably played game to about 3.9 bits each on average. I am sure it's possible to do better, but I suspect you'd hit diminishing returns before you get close to 3 bits.
I really should have included something about adding an opening book as I thought about that quite a lot. My conclusion was that an opening book is not going to be a really dramatic win. A simple scheme might encode say 64K opening sequences using 2 bytes, and save an average of perhaps 10 (half) moves. So a saving of 10*4 - 16 = 24 bits, spread over an average of 80 (half) moves. So about 0.3 bits per move.
You might question my estimate of 10 half moves max, but it's an educated guess. One thing I've discovered whilst working on my chess database is that the standard tabiya positions are reached by huge numbers of different transposition possibilities. See my blog post at
This means that the standard canonical way of reaching a well known position doesn't serve as a good proxy for the start of all the games that include that position.
In the "Checkers is solved" paper, they state
"The complete 10-piece databases contain 39 trillion
positions (Table 1). They are compressed into 237
gigabytes, an average of 154 positions per byte!"
Do you have any idea how this would be done? It seems crazy.
Probably just standard compression techniques - the linked article is mostly discussing a standalone move, but there are a lot more options available when working with large strings of text (dictionaries, BWT, arithmetic coding, ...).
The downside is a lack of individual byte accessing without a lot of surrounding decompression work, but it'd be appropriate for stream processing
In fact the best compressed size is probably found by reducing some of the clever tricks in the article in order to expose more structure to a general compressor. Similar to running `precomp` or `antiX` before solid-packing multiple already-compressed files.
That's interesting, as you say it sounds like magic. I will have a look at the paper when I get a chance. No doubt there is some amazing tree data structures involved. That's what chess endgame tablebases use, but it's not something I pretend to understand.
>A separate comment: I wonder if the time efficiency issues mentioned are really that severe? Since the problem is so small/finite.
Just noticed this. The beauty of my "sweet spot" scheme is that processing these 8 bit moves is almost as quick as processing a straightforward normal move representation (32 bits in my code because I store extra information to make moves "undo-able" - although 16 bits is reasonable). This is important for some of the things I want to do - For example I am experimenting with the idea of implementing a query to search a database of games for a position by literally playing through every game looking for the position (actually the hash of the position). The standard approach is (I think) to index all the positions in the database - which swells the database a lot. If you do a little math you'll see that you need the move processing to be lightning fast for this to be responsive in a multi-million game database. Move processing involving generating move lists in every position would leave the user waiting minutes (maybe hours!).
That sounds a lot like how LZ77 works, with a predefined dictionary. Probably the only way to do it given the potential symbol size at any given moment.
Chess move compression was an interesting topic back when the games were stored on 360k floppy disks. Nowadays every master chess game ever played in the history of chess fits easily on one DVD, uncompressed.
So it's not clear what the point of compressing the moves, especially since at some points the article is concerned about size and sometimes about speed. If it's just an intellectual exercise then consider the following scheme:
Generate the legal moves for a position, then sort them. However don't sort them using a naive method like alphabetical order. Instead sort them in order of likeliness of being played. For example moves that capture the last moved piece are at the top of the list. So for example 1.e4 d5, now the first move in the list would be exd5, capturing the last moved piece. So the move exd5 can be encoded in 1 bit. Now imagine a 40 move game where every move played was the first one on the sorted list. This takes 80 bits to store the entire game. Of course moves farther down the list take more bits to encode.
This is similar to one of the schemes in the article, but the article gets hung up with fixing bit sizes rather than just using the exact number of bits required for each move which results in variable bit lengths for each move.
This is, more or less, the scheme Chessbase first used for their data files almost 30 years ago.
It's not quite that simple. I discuss variable bit schemes like the one you describe in the article. Your suggested scheme is an extreme variant, and it's by no means obvious it's optimal. I'd be prepared to be convinced if some statistical evidence were presented. As I say, designing these schemes is "a lot of fun". Under your scheme;
1st move in the list takes one bit (great)
2nd move in the list takes two bits (good)
3rd move in the list takes three bits (okay)
4th move in the list takes four bits (average)
5th move in the list takes five bits (worse than average)
...etc
There are many positions where there are lots of plausible moves, in such positions your scheme could use many bits. I would estimate your scheme would take around 4 bits per move on average, much like the similar scheme I describe.
You are correct that a good modern database of 5 million plus games stored uncompressed occupies about 5 gigabytes or 1 DVD. Clearly compression is very useful for online distribution, which is what people expect these days. Chessbase compression reduces the 5 gigs to 0.5 gigs, about 10:1.
The scheme I describe is a nice compromise between performance and maximum compression. In my database program I achieve much faster position search than Chessbase. I am trying to achieve similar results without massive position indexes, and my performant compressed move scheme is a key ingredient in my work.
Edit: But as I state in the intro to my article, I am just an amateur playing around - I am not expecting to 'beat' Chessbase. I'll settle for having some innovative aspects to my program.
Too late to edit my earlier reply, so I'll reply again instead. I've subsequently learned more, mainly from a discussion with "mxtppy" in the programming subreddit. I've updated my blog post with a new section "One Move in 2 Bits, Maybe?" to reflect my improved understanding.
At http://stackoverflow.com/questions/1831386/programmer-puzzle... I suggested combining the sort-the-moves-by-their-evaluation scheme with arithmetic coding according to statistics from a database of games -- how often do people choose the move the engine picks as best? Etc. (It's always easy to propose work for someone else.) (The third paragraph of that answer is irrelevant to real games, which always start from the same configuration.)
I've been doing some spare chess programming on a GUI myself (non-intensive on-off work for 1 year so far) and I decided to stay with the naive method (12 bits, straightforward src and dest squares) for compressing moves. It's obviously easier and faster to implement, I'm not hostage for some wacky bugs I could have done, and it's straightforward to parse and to format to long algebraic form (e2e4), which is the format UCI likes to receive (stockfish, for instance) and to store to a file.
Having said that, I might take a look into this 8 bit format :)
Very interesting! But I don't quite see why you need to bother with tracking pieces and swapping them. Can't you just go by the convention that pawn 0 is the first pawn you find when scanning across from a1 to h8, pawn 1 is the second pawn, etc.? Similarly for the knights, bishops, and rooks? (obviously still using the fallback when necessary)
That would eliminate the need for computing swaps while still producing the same move code for a given move in a given position.
This would work but it requires a scan of the whole board for pawn (and knight and rook) moves. I wanted (and eventually got - after a lot of mistakes along the way) a system which would use negligible CPU cycles for almost all moves.
I'd suggest a slight change. Code all the pawns into 2 'pieces' and have a special piece for promoting. The 16 moves for the special piece can be an index from a list of possible promotions which is easier to generate and canonicalise than the total move list. This frees up 5 pieces for promoted queens. You could special case the king into the spare entropy in the rooks and bishops to save one more piece.
I don't think that would work (but possibly I misunderstand!). The system cannot cope with (up to, if all 8 pawns stood on the 7th rank) 8 more pieces! The fact that the queen needs 1 more 'piece' already requires a certain amount of ingenuity to work around.
"An amusing point is that some moves really would require zero bits – this happens when there is only one legal move in the position, there’s no need to store anything at all in that case."
What if a player decides to resign before making that move?
Why does every move have to occupy the same number of bits? That looks inefficient to me. Why not make the movement of pawns occupy fewer and the queen more bits?
In the worst case pawns can be quite demanding - because of underpromotion - there can be up to 12 moves available to a pawn - more than a knight or a king. Of course it is possible to design a scheme where moves take a variable number of bits - I discuss some promising methods in the article. But you will always need 8 bits for some moves (information theory says so) and the beauty of my "sweet spot" scheme is that it simple and quick, whilst still offering reasonable compression.
No, chess engines use the simple highly performant "native" move representations. My 8 bit scheme approaches the performance of native representation, but is only (potentially) useful for other types of applications, particularly chess databases.
> A move (apart from castling and pawn promotion) can be specified by giving the original and final squares occupied by the moved piece. each of these squares is a choice from 64, thus 6 binary digits each is sufficient, a total of 12 for the move. Thus the initial move P-K4 would be represented by 1, 4; 3, 4. To represent pawn promotion on a set of three binary digits can be added specifying the pieces that the pawn becomes. Castling is described by the king move (this being the only way the king can move two squares). Thus, a move is represented by (a, b, c) where a and b are squares and c specifies a piece in case of promotion.
I'm actually slightly surprised Shannon didn't propose a more compact scheme using the lower entropy of legal chess moves, but I guess his purpose in this article was more to do a ballpark estimate of the feasibility of computer chess playing in general.
[1] http://www.pi.infn.it/~carosi/chess/shannon.txt