I made some tweaks (and a bug fix or two), left it running overnight (four instances, one per CPU core)... They all get down to 12 to 14, but it seems like they stop progressing. Haven't seen 11 yet.
It's not pretty, but I did a few things. Instead of taking the top n candidates and breeding them randomly, I kept all of them (NUM_STATES 200 and KEPT_STATES 200), but weight them according to score. So the best scores have the highest chance of passing their DNA down to the next generation, but the occasional loser gets lucky too.
Mutation is also weighted, so the most likely number of mutations is zero, but it's possible to have up to 4. Increasing this value made the performance go down, but it's also possible that there is no way to get to a solution by incrementally tweaking a decent attempt. Which could explain why I'm stuck at 12.
There are a few optimizations that may or may not make a difference, for instance in the scoring function. I think the pick_best_states function was not actually picking the best states. If you had two states with the same score, the first one is kept, but the next one is not. I fixed this. The stupid thing is, with NUM_STATES and KEPT_STATES at 200, it's basically an n^2 sort now. I never bothered to improve that.
I'm not sure minimizing the score (at least, until you get to 0) is interesting, but anyway, I wrote a simple hill-climbing-with-random-perturbation routine and let it run over and over for the last couple of days. So far my best has a score of 10:
Something that might help would be to have the grids in some sort of canonical form: as swapping rows and columns doesn't change the score, for every configuration there is a reordering such that the whole grid value is minimal (i.e. if we read out as a 279 digit number). Intuition says that if grids were brought in such a canonical form, recombination might work better.
I've got 67 just using random placements with heuristics, programmed in Python. Apparently they've got 73 (which is 289/4 rounded up), and that's why they think this might be possible.