I wrote a python script that use regular expressions to winnow out words based on constraints imposed by previous guesses. There are 3 parts to a regex. The first uses positive look aheads to verify the test word includes each letter that has been marked as correct but in the wrong location. These are followed by negative groups with all wrong letters indicated. Interspersed are correct letters in the correct position. This filters my word list from 5300 words to a handful with each guess. It works pretty well but it still requires guessing. I always run it after I’ve solved the word using just my brain. It gives me a good sense as to how good or bad my logic was.
What I find interesting about word games is that they are very rarely games about words. E.g. Scrabble is adversarial optimization, Wordle is constraint satisfaction, etc.
To this day the puzzles I enjoy the most are crosswords and rebuses, mostly because I can switch off the part of my brain that screams "oh yeah, just grep /usr/dict/words".
> word games […] are very rarely games about words
> Scrabble is adversarial optimization
A friend of mine used to joke that Scrabble is an area control game with 40,000 rules for tile placement. Like many good jokes, there is a kernel of truth in there.
I have bad news for you.
Here's a potential algorithm: 1) Fine-tune a language model to come up with words that fit the crossword descriptive sentence. 2) Problem is reduced to a constraint satisfaction problem. Implementation is left as exercise for the reader :)
When you see a combination with a bunch of possible letters, you have to try to eliminate the letters instead of going for the win. Lots of people keep going for the win and losing when they could have eliminated a bunch of possibilities with a completely different search.
Often it's more helpful to use one guess you know will not match, just to eliminate or split numerous cases at once. If trapped with _IGHT which could start with many letters, a guess like FARM would reduce your guesses in a single move.
This also helps avoid or escape focus on an expected pattern that may be wrong (premature convergence.)
It can be hard to realize that's the case though. (_IGHT is pretty obvious of course.) And if you can only hit 2 or 3 letters it can be somewhat of a gamble vs. just going for a valid answer. (Of course if there are a ton of options you may be in a tough position anyway.)
Hard mode doesn't change the optimal strategy much. It just makes a small fraction of words "insta-lose" because if play one and get unlucky, you'll have to switch to eliminating one letter at a time and probably run out of turns before you've covered all possibilities.
Dots allow any character though. I don't have a scrabble5.txt so I'm using /usr/share/dict/words and using the dots resulted in the capitalized words (proper names) and words with apostrophes being included.
If you use the wrong dictionary you will get the wrong results. No use of grep will solve that. You could replace the dots with [a-z] but that is unnecessarily verbose if you have the right dictionary.
And if you use a specific enough dictionary for your problem, you can even just use cat! Just joking, I get what you're saying, the blog post is grepping a particular file, it's silly for people to grep a different file and use that as evidence your regex is bad.
But the dictionary is still wrong in one sense - there's an actual word list Wordle uses. I think from the way the blog post is written (e.g. "I refused to believe Wordle would be so evil as to..."), they're purposely avoiding looking up the actual Wordle word list, so they're purposely using the wrong dictionary.
I kind of relate to that - looking at the source code to devise strategies kind of feels like cheating.
That seems to be a list of the valid words to guess, not the valid answers though. If I were trying to guess an answer from that list, I'd probably go with light followed by limit with others less to very much less likely.
Where does this scrabble5.txt come from? I’ve been looking at the system words file for this type of exploration, but that has an awful lot of words that make no sense to put in a word game.
There are two arrays in the Wordle JavaScript. One is a list of valid answers, the other is a list of other words you can guess but aren't valid answers. You can view source and pull them both out. My code does this for you and allows you to use regular expressions:
https://news.ycombinator.com/item?id=30653322
There are also lists of Scrabble words; I use one called enable.
The targets list is the one you want, it's the valid answers. The dictionary is the valid guesses.
In the targets list, some words are starred out, just ignore them, those are words that got removed. The list is ordered and you can pick a spot in the list and ignore everything below it; for Xordle (and again inherited from hello-wordl) "mulch" is the last word used in the list.
I LOVE quordle. found it on an HN post. I enjoy it because it's much more (in my opinion) about reducing unknown information about all the letters than it is about guessing for each possible word.
Quordle, for me, quickly turned relatively stale. I'm quite convinced it's optimal (in terms of human play) to have a fixed set of 4 words to cover 20 letters, at which point you have 5 guesses to solve 4 words based on that information. Fun for a while, but not very deep.
I just choose not to play it like that, and it remains fresh. I try to get all vowel information fairly quickly, but as far as possible I am using existing information to drive me forward rather than going fishing for arbitrary new information.
Octordle is definitely like that. I think you want to do a 3 to 4 word opening sequence of some sort unless you get very lucky earlier. That said, going beyond 3 words I found turned it into more of an anagram solving game but not really improving my typical score. (I find it's very hard to have more than one or two guesses left given you pretty much have to use 8/13 for the final guesses unless you luck onto an answer you weren't shooting for.)
I’m at my worst in Octordle when I get my first word in the 3 guess. Then I feel pressured to get 2 + 8 guesses. However that usually doesn’t work out and I end up wasting the 4th guess with unoptimal letter coverage.
Can egrep do lookaheads or behinds? I find it necessary to use positive lookaheads to verify the test words include all the letters marked present but in the wrong location.
I was also curious about this as I use grep -P (for PCRE regexes) if I need to use them. Looking at the manual, lookaheads and lookbehinds don't seem to be supported by regular gnu grep or grep -E, so you do need to use grep -P for them.
Is there a pattern language that would allow for the search in question being done in a single pattern instead of glueing two patterns together with an OR-operator?