Hacker News new | past | comments | ask | show | jobs | submit login
Blowing up my compile times for dubious benefits (claytonwramsey.github.io)
94 points by Tyrubias on June 22, 2023 | hide | past | favorite | 27 comments



While this post has a good amount of substance, I found it rather hard to understand due to small mistakes (or hidden assumptions, or something else?). All of which seem easily fixed/clarified, but these sorts of things are unnecessary friction for readers.

Edit: Thanks to Jasper for stating the big hidden assumption made by the author - that he is only considering the case of being blocked by enemy pieces, not one's own pieces. This is motivated by the fact that one can use the same side occupancy board to trivially remove all moves that are illegal due to self-blocking. This resolves many of the technical issues, although again, it should be be made explicit. The typos/silly errors and pedagogical mistakes remain.

Some examples:

- According to the first bitboard (and the convention stated in the OP), A2 has index 8, not 1.

- In the second bitboard, the rook on e4 should be able to move to e1,a4,e8,h4.

- max of 9 relevant occupancy bits for a bishop - how? A bishop on e4 for example would have to check b1,c2,d3,f5,g6,h7 and a8,b7,c6,d5,f3,g2,h1 for a total of 13 bits. Similar mistake for a the rook.

- The so called "original" masked occupancy bitboard was not mentioned previously

- That same bitboard has variables b1,b2,b3,b4,b5 on it, a notational collision with the squares b1-b5. - The variables b1-b5 are in the wrong place (presumably they were meant to be on c3,d4,e5,f6,g7 since the bishop is on b2, and again there is the question of why not h8 and the other missing squares).

- Magic number comes out of nowhere. What is it doing? Presumably the purpose is to extract the relevant positional bits to be the most significant ones while keeping the order, done through the multiplication + bitmasking, but this should be explicitly stated. We also have no idea how it is derived.

- What are the "collisions" actually collisions of? It would be nice to see an example of this.


Hi, I'm the original author! This is my first post. I might talk more about chess engines in my blog in the future...

Some background: I taught a class on chess engines this spring (at some point, I'll write up a postmortem on it, maybe). This post was originally derived from my lecture notes for the class, and was (if I remember correctly) the 5th lecture, so I could make more assumptions about the students being steeped in chess-engine lingo.

Josh Triplett is entirely correct here on the masking technique. There are two steps to sliding-piece move generation: first, creating the set of squares a piece can "see," and then the set that it can attack. Magic bitboards are used for generating the set of squares a piece can see.

In terms of where magic bitboards come from: there's not that much in terms of logical derivation. You just have to kind of squint and say, "yeah, I believe that'll work." Magic numbers are found by brute force trial-and-error.

Collisions are caused when the magic multiply doesn't actually yield a perfect bit extraction. Imagine if O * M >> 59 was instead some gross expression of all five bits. The magic number for B1 here is actually an exception - things are not usually so easy. However, if two different positions have the same attack set, and they get sent to the same index by a magic multiply, it doesn't matter that they collided because they map to the same value.


> - max of 9 relevant occupancy bits for a bishop - how? A bishop on e4 for example would have to check b1,c2,d3,f5,g6,h7 and a8,b7,c6,d5,f3,g2,h1 for a total of 13 bits. Similar mistake for a the rook.

If I'm understanding correctly: it's trivial to use a mask with the occupancy bitmap for white/black to eliminate moves atop your own pieces, and generate captures for moves atop your opponent's pieces. The sliding move handling needs to look up whether a piece is in the way of your destination. So, the sliding move handling doesn't need to handle spaces on the edge of the board, because they'll never be in the way of another square.


This makes a lot of sense and gives context to the assumption that we only need to handle the case where an opponent's piece is in the way. Would be great if the author took the 2 sentences to explain this near the beginning of the article.


> - In the second bitboard, the rook on e4 should be able to move to e1,a4,e8,h4.

It's trying to find the set of legal moves. Or, rather, it's trying to find the set of illegal moves that it can exclude from the subset of possible moves. The only thing determining whether a move is legal is whether there's a piece in the way of that move, aka whether a piece is occupying a square, that's what the bitboard check is about. If E1 is occupied, the set of legal moves is the same as if it is unoccupied, so it is irrelevant to check the bitboard for E1. I agree the article could be clearer about this.


I'm clearly misunderstanding something here. Assuming the rest of the board is empty, if e1 is occupied, the rook can't move to e1, whereas if it isn't occupied, the rook can move to e1, so there is at least one position in which we must check e1. What am I misunderstanding here?


Oh, I was assuming it was occupied by an enemy piece, and would capture it. You're right that if it was occupied by an allied piece, it's not a legal move.


> to make all the constants known at compile time, everything has to be written in a const function. This means: No allocations [...] No for loops

These constants are relatively unchanging relative to the rest of your program. Why not write a code generator (without the above weird restrictions) to compute them once and save the result?


I thought about that. The big benefit of using `const` here is that I can lean on my existing abstractions (e.g. Bitboard, Square, Direction) which I was already using in the engine. In order to use those abstractions in `build.rs`, I would have to split out those base data structures into a separate crate. I found empirically that splitting out code into a separate crate resulted in an observable Elo reduction (though this was about a year ago, so I forget how much), even with `lto=true`.


It is better to write obscure compile time code in the metalanguage than to write a separate program which generates the same result, even if the separate program would generate something than can then be spliced in at compile time anyway.

Well, "better". "Industry practice" might be more accurate. Also "necessary" if your build system is garbage and/or your team is frighted of code generators but not meta-programming.


You can write the compile time generation as a separate Rust program. It has the added advantage that you only need to run it if the program itself changes and otherwise check in the generated code.

The main advantage is that you don’t have to generate Rust code as a string. You’re just generating Rust data structures in Rust and that is convenient. Not sure if there’s any tricks for that.


I was also confused by this. I was wondering if he was going to use #embed or something of the like.


I’m not a Rustacean, but it seems much easier to pre compute, store in a file, then load the file at startup


In rust you can also pre-compute (optionally at build time in the build.rs) and load the file at compile time. Either using std::include_str to get a string with the file content (and then parse at startup without needing the separate file) or you generate a valid rust file and use std::include and still get all the potential compiler optimizations from knowing the values of these constants.

But honestly, the const code for generating the values doesn't look thaaat bad. It's bad by rust standards, but not far from what equivalent C code would look like.


write rust syntax into that file and feed it into the build to delete misc fears about costs or failure modes of loading files


My first thought would be to store the magic numbers in either a db or a config file.


The bitboard approach is similar to how graphics cards used to organize their memory in the heyday. It wasn't one big buffer per screen with one or more consecutive bytes representing on pixel and one pixel after another. There were as many buffers as required by the bit depth, e.g. one for black and white, two for 4-colors and so on. Every bit was a pixel, so the top left corner was the MSB of the first byte of any buffer. We called this bitplanes.


While i enjoyed the article, the title and opening sentence really deserve credit for being genuinely fun to read.


I'm the original author. Thank you!!!!


An excellent explanation of Magic Bitboards can be found here: https://analog-hors.github.io/site/magic-bitboards/


Wow, this is a really cool project. Just one thing that I was curious about... At the end there was talk about elo rating and their uncertainties, but I wasn't sure how those were computed and if those were determined to be statistically significant. The dubious benefits of the compile times seem independent from an ELO rating and so I think a statistical test to ensure the significance of the results would be good to further drive the point home.

Apart from the ELO claim, the runtime speedup is sure beneficial at the very least! Cheers to the author


I wonder whether using `const` over `static` was intentional here. The first gets instantiated at each use site while the latter exists (approximately) once in the binary.


It is! I tried using `static` as a test, and benchmarked the engines against each other.

The final result was that a `const` lookup table yielded +19 Elo over dynamic magic generation, while a `static` one yielded only +10 in a tournament between all three. Elo is non-linear and dependent on the set of competitors, so that's why it has no relation to the +6.6 from the end of my post.

My suspicion is that moving to `static` removes the ability to optimize against the magic table when the query is known at compile time, since I often use magic lookups from fixed squares and occupancies in places other than move generation.

I also found that it yielded no change in compile time or binary size, so it was basically all disadvantages to use `static`.


Mathematical representation of A-bar / A' - this is the complement of the set A? That should probably be `~A` and not `!A` for the bitboard operation. The logical negation would not give you the desired results, instead you want the bitwise negation (`~A`).


The article should have clarified it, but this is Rust so !A is correct (having a real boolean type, and no implicit casts to int, you don’t need a separate bitwise and logical not. Conceptually, ! always does a bitwise not, which in the case of a single-bit value is also the logical not.)


> I suspect that the constant evaluator in Rust is just plain slow.

IIRC, it's using the miri interpreter, which is indeed very slow.


Not exactly: IIRC it is part of Miri somehow, but it doesn't do all the fancy checks for UB (only part of them).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: