Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wordle and grep (leancrew.com)
84 points by tosh on April 16, 2022 | hide | past | favorite | 52 comments


I wrote a python script that use regular expressions to winnow out words based on constraints imposed by previous guesses. There are 3 parts to a regex. The first uses positive look aheads to verify the test word includes each letter that has been marked as correct but in the wrong location. These are followed by negative groups with all wrong letters indicated. Interspersed are correct letters in the correct position. This filters my word list from 5300 words to a handful with each guess. It works pretty well but it still requires guessing. I always run it after I’ve solved the word using just my brain. It gives me a good sense as to how good or bad my logic was.


What I find interesting about word games is that they are very rarely games about words. E.g. Scrabble is adversarial optimization, Wordle is constraint satisfaction, etc.

To this day the puzzles I enjoy the most are crosswords and rebuses, mostly because I can switch off the part of my brain that screams "oh yeah, just grep /usr/dict/words".


I really enjoy Codenames because it’s a word-meanings game instead of a word-spellings game.


> word games […] are very rarely games about words

> Scrabble is adversarial optimization

A friend of mine used to joke that Scrabble is an area control game with 40,000 rules for tile placement. Like many good jokes, there is a kernel of truth in there.


I have bad news for you. Here's a potential algorithm: 1) Fine-tune a language model to come up with words that fit the crossword descriptive sentence. 2) Problem is reduced to a constraint satisfaction problem. Implementation is left as exercise for the reader :)


> I refused to believe Wordle would be so evil as to have two repeated letters.

I remember VIVID which was pretty tough.


Worgle had SYZYGY the other day.

I got it in three; I had a remarked to someone the previous day what a good word it would be for worgle!


When you see a combination with a bunch of possible letters, you have to try to eliminate the letters instead of going for the win. Lots of people keep going for the win and losing when they could have eliminated a bunch of possibilities with a completely different search.


Often it's more helpful to use one guess you know will not match, just to eliminate or split numerous cases at once. If trapped with _IGHT which could start with many letters, a guess like FARM would reduce your guesses in a single move.

This also helps avoid or escape focus on an expected pattern that may be wrong (premature convergence.)


Bight, eight, fight, light, might, night, right, sight, tight, wight.

By letter frequency I think rents/stern/terns are good guesses. I don't think there's any two words cover 9 of the 10 initial letters in two guesses.

Blent and frents/strew/terms/wefts/wrest cover 8.


It can be hard to realize that's the case though. (_IGHT is pretty obvious of course.) And if you can only hit 2 or 3 letters it can be somewhat of a gamble vs. just going for a valid answer. (Of course if there are a ton of options you may be in a tough position anyway.)


Or they are playing I hard mode which forces this style of guessing.


Hard mode doesn't change the optimal strategy much. It just makes a small fraction of words "insta-lose" because if play one and get unlucky, you'll have to switch to eliminating one letter at a time and probably run out of turns before you've covered all possibilities.


Seems like the pattern should be 'l.[^t]..' and then just add in a '| grep t'. That method reduces the number of assumptions.


Dots allow any character though. I don't have a scrabble5.txt so I'm using /usr/share/dict/words and using the dots resulted in the capitalized words (proper names) and words with apostrophes being included.


If you use the wrong dictionary you will get the wrong results. No use of grep will solve that. You could replace the dots with [a-z] but that is unnecessarily verbose if you have the right dictionary.


And if you use a specific enough dictionary for your problem, you can even just use cat! Just joking, I get what you're saying, the blog post is grepping a particular file, it's silly for people to grep a different file and use that as evidence your regex is bad.

But the dictionary is still wrong in one sense - there's an actual word list Wordle uses. I think from the way the blog post is written (e.g. "I refused to believe Wordle would be so evil as to..."), they're purposely avoiding looking up the actual Wordle word list, so they're purposely using the wrong dictionary.

I kind of relate to that - looking at the source code to devise strategies kind of feels like cheating.


The word list is indeed right in the JavaScript but what fun is that (find today's word and the next one in the list is tomorrow's word)


https://scoredle.com calculates this for you using the actual word list. in this case it shows 17 answers:

lifts lints lists lilts lofts louts lowts lunts loots lusty lofty lusts linty light limit licht licit


That seems to be a list of the valid words to guess, not the valid answers though. If I were trying to guess an answer from that list, I'd probably go with light followed by limit with others less to very much less likely.


Yeah you must be right. Maybe they considered it too spoilery to show you the target list.


Where does this scrabble5.txt come from? I’ve been looking at the system words file for this type of exploration, but that has an awful lot of words that make no sense to put in a word game.


There are two arrays in the Wordle JavaScript. One is a list of valid answers, the other is a list of other words you can guess but aren't valid answers. You can view source and pull them both out. My code does this for you and allows you to use regular expressions: https://news.ycombinator.com/item?id=30653322

There are also lists of Scrabble words; I use one called enable.


I was surprised when NYT's wordle bot linked me to the solution list (alphabetically sorted) https://static.nytimes.com/newsgraphics/2022/01/25/wordle-so...


Is this still the case after the NYT purchase?


Yes, my code still works.


Here's the target list I use in Xordle, inherited from hello-wordl: https://raw.githubusercontent.com/6zs/xordle/main/src/fives_...

And here's the dictionary: https://raw.githubusercontent.com/6zs/xordle/main/src/fives_...

The targets list is the one you want, it's the valid answers. The dictionary is the valid guesses.

In the targets list, some words are starred out, just ignore them, those are words that got removed. The list is ordered and you can pick a spot in the list and ignore everything below it; for Xordle (and again inherited from hello-wordl) "mulch" is the last word used in the list.


This is why I like Quordle and Waffle. Both are evil and definitely do repeated letters and rare words.


Here's 32 boards: https://duotrigordle.com/


I LOVE quordle. found it on an HN post. I enjoy it because it's much more (in my opinion) about reducing unknown information about all the letters than it is about guessing for each possible word.


Quordle, for me, quickly turned relatively stale. I'm quite convinced it's optimal (in terms of human play) to have a fixed set of 4 words to cover 20 letters, at which point you have 5 guesses to solve 4 words based on that information. Fun for a while, but not very deep.


I just choose not to play it like that, and it remains fresh. I try to get all vowel information fairly quickly, but as far as possible I am using existing information to drive me forward rather than going fishing for arbitrary new information.


Octordle is definitely like that. I think you want to do a 3 to 4 word opening sequence of some sort unless you get very lucky earlier. That said, going beyond 3 words I found turned it into more of an anagram solving game but not really improving my typical score. (I find it's very hard to have more than one or two guesses left given you pretty much have to use 8/13 for the final guesses unless you luck onto an answer you weren't shooting for.)


I aim to solve Octordle in 3+8 guesses. I did so yesterday (#82), but today's (#83) needed a 50-50 guess which I got wrong, so took 12.

If it takes more than 12, it usually means I've been too casual.


I’m at my worst in Octordle when I get my first word in the 3 guess. Then I feel pressured to get 2 + 8 guesses. However that usually doesn’t work out and I end up wasting the 4th guess with unoptimal letter coverage.


Thanks for Quordle. That's another chunk of my life taken up every day!


There is a wordle bot that analyzes your play. It’s a bit much for me, but it has some insights.

https://www.nytimes.com/interactive/2022/upshot/wordle-bot.h...


I built a real-time helper that's a bit simpler approach: https://github.com/everythingishacked/Hackle

I find it useful for trying out new words/strategies after a first round of play.


I embraced grep as my preferred cheating method for Wordle, then I whipped up a quick Docker image with a few scripts to make the deed easier:

https://hub.docker.com/r/jasonincanada/greple


I made a little program that assists with Wordle too. It's neither elegant nor perfect but seems like it works pretty well.

https://github.com/peteryates/dotfiles/blob/master/bin/bin/w...


This is an illustration that grep is a great tool, quick and easy to use.


rather egrep, though


Both grep and egrep are the same executable.

  bash-3.2$ diff `which grep` `which egrep`
  bash-3.2$


Can egrep do lookaheads or behinds? I find it necessary to use positive lookaheads to verify the test words include all the letters marked present but in the wrong location.


I was also curious about this as I use grep -P (for PCRE regexes) if I need to use them. Looking at the manual, lookaheads and lookbehinds don't seem to be supported by regular gnu grep or grep -E, so you do need to use grep -P for them.

https://www.gnu.org/software/grep/manual/grep.pdf


egrep is grep -E.


Is there a pattern language that would allow for the search in question being done in a single pattern instead of glueing two patterns together with an OR-operator?


It would be fairly straightforward to write a Wordle DSL if you wanted to, which would be much faster than any generic pattern matcher.

That's the risk of an overly complicated language, you can easily wind up slower than a series of regex filters.


I don't see an or operator in the post anywhere, so I'm confused about which search you mean.


They used two separate searches for where the T was. Didn't bother with an OR in code, just listed the results manually.


    (l[iouy][^taer]t[^aers])|(l[iouy][^taer][^aers]t)
Is in effect what the author is doing in regex (and how it is explained).


nutrimatic.org is a tool built for this kind of query. Also super useful for puzzle hunts :)

Here's a query replicating the article's search: https://nutrimatic.org/?q=l%3C_t__%3E&go=Go




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: