They’re my go to problem when learning a new programming language because they’re just complex enough to exercise a whole bunch of different language features.
I know someone who likes to create a fractal generator when learning a new language. He says it gives you a bit of exposure to the strengths of quirks in the language from implementing algorithms, working with data structures, and displaying a GUI to render an image to screen.
My go-to is the first fifty problems on Project Euler. It ramps up quite nicely from "can you write a for loop?" to problems with some nice complexity. Great for learning a language!
I do the same thing! The first program I ever wrote was a very simple sudoku solver on a TI-84 calculator (it performed the constraint propagation logic that Norvig describes), and for every programming language I have learned since I have re-written that code.
The first one I'd seen was implemented in a Ruby book, and I agree it's a good yet moderately challenging problem to do. Once you're done solving the base case, making improvements like implementing a more efficient backtracking algorithm or drawing a board within a GUI can be added as well.
Edit: The book was 'The Ruby Programming Language' by O'Reilly Publishing, not 'Programming Python' as I originally thought, also published by O'Reilly.
Seems this is a common theme... I wrote a (no search) solver in Perl for a class a while back. Mostly wanted to see how far my own solution finding rules would get me without having to follow through on solving hundreds of puzzles myself.
I’ve always used an authoritative DNS server as my go to. Almost all code I write has to do some kind of network IO, so learning the nuances of how TCP and UDP sockets work is important to me. It also happens to touch on bitwise operations, data structures, and parsing, so it’s a great project to quickly become competent in a language (assuming you already know DNS well at a protocol layer).
The other day we were at a restaurant and they gave my four year old a kids menu. One of the activities was a modified sudoku that went up to 6. My first thought was that I'd never realized you could do sudokus with other values of N.
But the really cool part was that once I taught her the rules (the constraints) she was actually able to figure it out without much help at all!
Turns out sudoku is a good way to teach kids logic, even at four years old.
You might want to check out KenKen puzzles. They are similar to Sudoku and were created by a math teacher as a fun learning tool. IMO, they are quite a bit more interesting than Sudoku puzzles.
As well as Kenken, I also like Killer Sudoku as a more interesting option than vanilla Sudoku. It's basically 9x9 Kenken with addition as the only operator, it uses a bunch of the same thought processes as Kakuro which I like too.
A long time ago I did high school math instruction. I used Sudoku for an initial introduction to proofs - something where the student knew they were implicitly going through a "proof" process.
Aha yes! When I wrote my first sudoku solver a decade ago I was having fun and decided to stress test my solver with bigger N. I found a site dedicated to these and with puzzles as large as 64x64.
Wrote a solver using basically the same idea back in college. Was kind of fun. Probably some stupid things in there that I would do differently now with more experience.
Same. Didn't save the code; wish I had just so I could see if I got a better execution time than his 1439 second example.
Amusingly, my rationale was much like the author's; I freakin' hate Sudoku. So the fact it's such an obvious candidate to have an automated solver for meant I wrote one (it was also an excuse to do something more meaningful than the intro to CS exercises that class had me doing).
I left my solver running for 28 minutes and it never found a solution to that puzzle. Guess whatever corner case it's hitting in his program is the same as my program runs into.
It's easy to see why this puzzle has no solution. Looking at columns 4-6 we see that none of the digits 1,5,6 can occur in cells G4,H4,I4 or G6,H6,I6. So some permutation of these digits occupies G5,H5,I5 and all digits other than 1,5,6 can be eliminated from these cells. But we also see that all the digits 1,5,6 have already been placed in row G, leaving no candidates for G5.
It's also easy to see why this presents a problem for the Norvig solver (and many solvers like it). If you squint at the algorithm you'll see that it behaves like DPLL (https://en.wikipedia.org/wiki/DPLL_algorithm) given a CNF encoding of the Sudoku exactly-one constraints. In this scheme constraint propagation only occurs via unit resolution. i.e., when a positive clause is reduced to a single literal because the domain of a cell is reduced to a single digit or a unit/group is reduced to having a single place for a given digit. In Sudoku land these are known as naked and hidden singles.
Unfortunately, the conclusion that none of 2,4,7,8,9 can occupy G5 arises from the interaction of multiple non-unit clauses. It is simply not reachable by unit resolution. As a result, the conflict won't be discovered until the algorithm actually attempts to assign G5, and, since there are many cells that initially have fewer possibilities than G5, this is not likely to happen early in the search.
Some of the faster solvers can handle this by adding checks for "locked candidates", or by representing the logic in a way that's equisatisfiable, but that makes these inferences available to unit resolution. In such cases this puzzle is recognized as unsatisfiable in microseconds.
Hmm it's weird. I tried my solver on it and it tells me that it can't find a solution (in 13 seconds). My solver is pretty well tested (on some very difficult sudokus as well). Maybe this guy's solver is bugged?
Yeah; the one I implemented used a different algorithm, and would have noted that there was no solution.
I basically just kept a "possibility space" for each empty cell (containing a list of all possible values it could be), started with the one with the smallest size > 0, tried a value as 'true', removed that possible value from all related cells (same row, column, 3x3), recursed. If an empty cell had a possibility space of 0, it would pop back off the stack, removing that prior 'possibility' and resume from there. If I ever got back to an empty stack, there was no solution.
I threw a few dozen puzzles at it, trying to pull ones that were hard both for humans and, ostensibly, for machines, but I never saw it take more than a couple seconds, and that in the most degenerate "had to try every possible permutation" cases. Hence why I really wish I'd kept the code to throw that one at.
That doesn't seem any different than the algorithm my solver uses. From what a sibling poster commented, it looks like the thing that causes the trouble is that the square that makes it unsolvable basically lets you solve it down to just that square in many different ways, which then just ends up with you enumerating an absolutely enormous state space because the solvers don't have enough tricks built in to recognize it as an unsolvable board from the start.
Right; I am saying it's different than the original post's because it -could- handle unsolvable boards (maybe I misread; it sounded like that was an unhandled case in it).
I knew that there could be degenerate cases like that that would cause it to take O(n!) time, but the ones that I ran into still didn't take -that- long, which was what surprised me.
I guess you should be able to verify pretty easily that the solution he posted is valid, right? But I guess he didn't actually post the solution his code came up with, so who knows...
Oh that explains everything. That's what I get for being too lazy and not reading the fine print I feel happy now at the fact that my solver is about 2 orders of magnitude faster than his!
It's trivial to know that it has more than one possible solution since less than 8 of the possible numbers are on the initial board meaning any valid solution that is found has a mirrored solution with all of one number swapped for all of another number.
Sure. My solver is designed to return the first solution it finds (it doesn't check for solution uniqueness, just that there is one). However currently my solver is saying that it can't find _any_ solution at all.
I love it how with the Z3Py solver you just write down a few constraints (rows, columns, 3x3 boxes and the given values) and you immediately get an answer.
The key insight when writing a sudoku solver is that it is just depth first search. Once someone told me that it just sort of clicked. Just guess each possible number at each open square and if you ever get to an un-solvable state, just undo the last change you made and try the next one. I was extremely satisfied with myself after writing my first really clean solver with backtracking.
I'm sure there are other, more efficient, ways to write sudoku solvers, but the simplicity and clarity the DFS solver offers is pretty cool IMO.
It can be done with just depth first search, but if you want it to be fast you can go pretty far down the rabbit hole. The details of your representation, your heuristics, and the kinds of constraint propagation you support all matter a lot.
I don't think there are, actually. Some algorithms might run faster in practice, but, since the game is NP-complete, brute force is pretty much as good as it gets for exact solutions.
The constraint propagation bit is the main key. It still ends up being a DFS, but at most steps along the way, finding one number leads causally to several other spaces being found.
In most easy / medium puzzles, there are actually no points where you need to guess (i.e. recurse down the tree).
In many hard puzzles, there are only one of two guesses, which is where you would recurse down the decision tree.
Things get a bit more intense once you move into the puzzles specifically designed to be a pain in the neck for programs, but often those inputs aren't actually valid Sudoku puzzles.
I've only ever written one solver in Python about 10yrs ago. But it wasn't a search and never made guesses, it was an enumeration of all the analog elimination methods I'd ever figured out when solving them on paper.
It was fast, but there were a few difficult puzzles it couldn't solve - obviously there were some real world techniques/alogrithms I hadn't managed to figure out.
There are sudoku puzzles which have a unique solution but which also force a “guess” in the solving process, so your approach doesn’t work in the general case.
Do you mean the series of posts linked to from the one you give, beginning with [1]? They are an excellent demonstration of two of the pitfalls of naive TDD:
1) The things you can test first (before you have figured out the central problem) are not necessarily the things that matter most. The author should have addressed the algorithm for solving the puzzle first; everything else is secondary.
2) This is an example of those cases where having tests for what you are going to write does not provide much guidance into how to solve the problem.
When making this comparison, it's worth noting that Jeffries is showing his work process. Norvig is showing a cleaned up version of his final result after throwing away some earlier attempts. (Source: I spoke with Peter about Sudoku solvers shortly after he'd published.)
I think that's understood. But what's telling is that no single step of Jeffries' series of articles seems to be approaching any kind of sensible solution. It reads like an amateur -- e.g. me -- trying to write a game and getting stuck in the low hanging but inconsequential fruit.
Because Jeffries' series is about TDD, and not really Sudoku, the process is important. And the process failed him spectacularly.
> They are an excellent demonstration of two of the pitfalls of naive TDD
I think it's particularly interesting because Ron Jeffries is not a novice but one of the creators of Extreme Programming, an expert TDD practitioner and evangelist. And to my knowledge, he never acknowledged clearly that he failed in this exercise because he approached it the wrong way, or that he was naive about TDD.
I think he acknowledged it pretty clearly; he claimed it as one of his most public failures, didn't he? But also, where's the TDD naivete? The problem isn't TDD, the problem is knowing how to apply constraint propagation. You could TDD constraint propagation and it'd work fine.
> I think he acknowledged it pretty clearly; he claimed it as one of his most public failures, didn't he?
He did? All I see is a series of articles where he randomly flails towards no clear goal, with no clear understanding of Sudoku and gaining no insight either. At the end it just peters out. The last article claims "Sudoku will not ship", and that's it for a post-mortem.
Unless he wrote another article elsewhere explaining how TDD failed him, or how he employed it wrong, or owning up to any mistakes, I think he acknowledged nothing.
Watching Norving vs Jeffries writing a Sudoku solver is watching someone who understands the field vs a novice who doesn't know what he doesn't know. Except this novice is a self-proclaimed expert. It's embarrassing, really.
> But also, where's the TDD naivete? The problem isn't TDD, the problem is knowing how to apply constraint propagation. You could TDD constraint propagation and it'd work fine.
Jeffries "sold" TDD as a design and problem exploration technique (aka "TDD is not about testing"), which it really is not suitable for, or at least not for algorithms (I've no doubt you can TDD a CRUD application, but how uninteresting is that?). It's a lot less sexy if he had said "yeah, all the TDD in the world won't save you researching the problem, maybe actually reading the literature on the subject or learning about specific techniques (like constraint propagation) will help". TDD as a design/discovery method for algorithms doesn't work -- you won't learn constraint propagation by doing TDD. Did Jeffries acknowledge anything of the sort?
You know why TDD is often demonstrated in toy examples with the Fibonacci sequence or similar algorithms? It's because everyone already understands them. It's easy to "arrive" at a solution you already know beforehand.
Be wary of anyone who proposes some procedure X, and when it fails claims "yeah, well, this is not a failure of X because <reasons>".
He deserves some credit, but he would have deserved full credits had he acknowledged the limitations of TDD. Which are the ones you mentioned in your first post.
I think Norvig reveals the difference in his opening paragraph. He jumps straight to constraint propagation.
He builds a data type that encodes the constraint space. Any value this cell contains can not be present in these other 20 cells, so let's build upon that.
And then he builds the process of elimination structure that so many sudoku puzzlers used that it started showing up in many of the implementations.
> It turns out that the fundamental operation is not assigning a value, but rather eliminating one of the possible values for a square
Much faster if constraints are represented by single bits in an integer mask and binary masking operations are used - even the hardest soduko solves instantly.
My first ever CS assignment was to build a sudoku solver using MIPS. I remember reading this then and being inspired by the beauty of computer science!
This is pretty much exactly the algorithm that https://www.ioccc.org/2005/aidan/aidan.c uses, except obviously much more clearly expressed. There's also a reasonably straightforward extension of this technique that can test for solution uniqueness, and so generate random valid puzzles; I can't remember how it was implemented exactly but it involves continuing the search after finding the first solution and seeing if it succeeds.
I always thought one of the requirements for published Sudoku puzzles was that no trial and error should be required; it should be possible to deduce the single unique solution from the clues given. This would mean the "search" part of Norvig's algorithm would not be required; constraint propagation alone would be enough.
I think technically, if constraint propagation can't solve the puzzle, that means it's either contradictory (zero solutions) or ambiguous (multiple solutions).
However, it seems Norvig was feeding his program with all three types of puzzles -- zero, one, and multiple solutions -- so perhaps that led him to implement a more robust search.
If there is one unique solution, then constraint propagation always works, yes.
But a propagated constraint isn't always just a single value at a time. There are cases where a solver (human or machine) needs to identify that a group of two or three or more cells mutually creates a constraint. ("Naked twins" and other such configurations, in Sudoku jargon.) Which can recursively cause to exist another such mutual constraint, and so on. Identifying every possible way such a constraint can arise isn't a simple operation either to implement or to execute, computationally speaking.
For both Norvig's programmatic solver and for a human, it's possible and common for trial and error to be faster and more expedient than trying to identify and execute every possible means a constraint can arise. Norvig's aim was for implementational simplicity more so than computational elegance, deciding to care about only the simplest possible type of constraint, and let trial and error handle everything else.
The search part is definitely considered “bad form” by human solvers. Author apparently lacked the insight that elimination is generalizable to “if any set of N peers contains exactly N distinct values you can eliminate those values from all peers not in the set”, rather than just “if 1 square contains 1 value ...”. That alone would solve all but the hardest puzzles with just a few more lines of code.
I think the author had that insight, just chose not to use that route because it wasn't necessary. For either a human or machine solver, the trial and error route may well be simpler / less code / faster at runtime than implementing and executing every way to recognize a set of N peers for N values.
A while back I wrote a solver that uses no backtracking, just constraint propagation, and it wasn't substantially more complex than this.
One of the key insights comes from noticing a feature of groups of numbers in cells. For example, suppose you notice two cells in a given row and they're the only cells in that row that the numbers 1 and 2 can be placed in - then one of them is going to have 1 and the other will have 2, and you don't know which but you can eliminate any other possibilities in those two cells. This generalizes to groups of any size in rows/columns/squares. I think that and maybe one other heuristic along with the basic rules about number placement is enough to solve everything.
This sounds like Crook's Algorithm (https://www.ams.org/notices/200904/tx090400460p.pdf), which is a perfectly reasonable thing to do, especially for pencil-and-paper puzzle solving since pigeonhole inferences are easy for people to spot. But this inference rule does not suffice on its own for many hard puzzles. Crook's algorithm still requires backtracking.
> I can copy values with values.copy() which is simple and efficient. If I implemented a possibility as a Python set or list I would need to use copy.deepcopy(values), which is less efficient.
The array deepcopy isn't too bad, at least when writing code, it compressed into a neat slice [:]
I wrote a hinter[2] which was more fun than the actual solver too, because it shows the "minimum remaining values" as a colour coded chart.
On a related note, Knuth has a cool algorithm to solve exact coverage problems like sudoku (similar logic as Norvig's DFS but implemented very differently). It's more work to code it than Norvig's but worth it for the education:) (and it solves faster)
While dancing links is cool, when benchmarking it turns out using bits to represent values, and just copying, ends up being faster. This is because on modern CPUs about the worst thing you can do is lots of pointer following.
How fast they are depends on the difficulty of the sudoku and on your machine, but it's typically a few thousand/s on the hardest known and several 100k/s to a million/s on very easy.
I'm the author of Tdoku and maintainer of those benchmarks.
If anyone has a well optimized DLX implementation I'd love to add it to the comparison. The few I've sampled from github were not competitive, but I have no idea how much effort when into them.
Seems pretty competitive to me if the third fastest in your benchmark uses dancing links.
I do think it's probably true that dancing links wont be literally the fastest, but the neat thing is that it's a generic algorithm that solves ALL total cover problems, not just Sudoku. Of course a tuned Sudoku solver could probably beat it (optimizing around the Sudoku-specific parts), but it's nice to see that it's at least in the running.
Bear in mind that JSolve was 5x faster than the fastest DLX solver in the attractivechaos 2011 benchmarks, and today there are several solvers that are 2x faster than JSolve (https://github.com/t-dillon/tdoku/tree/master/benchmarks). If the difference is really an order of magnitude, that's pretty non-competitive.
I certainly don't mean do disparage DLX. It's a powerful and general algorithm. But if you need a fast Sudoku solver (for puzzle mining or studying Sudoku research questions) then it's not the tool I'd reach for. The backtracking solvers just get a lot of mileage out of tuning their representation to play nicely with hardware. They also blow general purpose CDCL SAT solvers out of the water, at least for 9x9 Sudoku.
> As computer security expert Ben Laurie has stated, Sudoku is "a denial of service attack on human intellect". Several people I know (including my wife) were infected by the virus, and I thought maybe this would demonstrate that they didn't need to spend any more time on Sudoku.
That's hilarious! But what puzzles, or even activities wouldn't be classified by Ben as a DOS on humans?
The algo described in the 'constraint propigation' section was basically how I implemented my sodoku solver, and it basically can do 99% of all puzzles thrown at it. And it wasn't particularly complicated.
They’re my go to problem when learning a new programming language because they’re just complex enough to exercise a whole bunch of different language features.
I built this when I wanted to learn Swift and dip my toe into machine learning: https://twitter.com/braddwyer/status/910030265006923776?s=21