Towards the end of the blog post the author explains that he constrained the generation to only tokens that would be legal. For the OpenAI models he generated up to 10 different outputs until he got one that was legal, or just randomly chose a move if it failed.
Seems noteworthy enough in itself, before we discuss their performance as chess players.