Hacker News new | past | comments | ask | show | jobs | submit login

Strongly implied but not explicitly stated in here - all these LLMs were able to consistently generate legal moves? All the way to the end of a game?

Seems noteworthy enough in itself, before we discuss their performance as chess players.




Towards the end of the blog post the author explains that he constrained the generation to only tokens that would be legal. For the OpenAI models he generated up to 10 different outputs until he got one that was legal, or just randomly chose a move if it failed.


> For the OpenAI models he generated up to 10 different outputs until he got one that was legal, or just randomly chose a move if it failed.

I wonder how often they failed to generate a move. That feels like it could be a meaningful difference.


Gpt-3.5-turbo-instruct had something like 5(or less) illegal moves in 8205

https://github.com/adamkarvonen/chess_gpt_eval

I expect the rest to be much worse if 4's performance is any indication


And the most notable part of that:

> Most of gpt-4's losses were due to illegal moves

3.5-turbo-instruct definitely has some better chess skills.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: