I did some experimenting with this a little while back and was disappointed in h...

s-macke · 2025-07-04T15:16:09 1751642169

Try the game 9:05 by Adam Cadre [0]. It's one of the easiest (and best) non-trivial text adventures. Some models are able to reach the first or even second ending.

[0] https://en.wikipedia.org/wiki/9:05

throwawayoldie · 2025-07-04T17:38:13 1751650693

What do you suppose would happen if you tried it on a game that doesn't have 25 years of walkthroughs written for it?

s-macke · 2025-07-04T18:11:34 1751652694

That’s a good point. For 9:05, I expect it would work just as well, since the game helps the user in many ways. The puzzles are of the type “The door is closed”, and you solve them with “open door.”

My suggestion concerns the poor performance DougHaber mentioned: if 9:05 can’t be solved, something else must be wrong with his experiments.

I’ve tried three dozen games, and it’s still hard to find ones suitable for LLM benchmarks. With non-linear complex text-adventure games, my guess is, that they get stuck in an endless loop at some point. Hence, I just test the progress in the first hundred steps.