https://github.com/adamkarvonen/chess_gpt_eval
I expect the rest to be much worse if 4's performance is any indication
> Most of gpt-4's losses were due to illegal moves
3.5-turbo-instruct definitely has some better chess skills.
https://github.com/adamkarvonen/chess_gpt_eval
I expect the rest to be much worse if 4's performance is any indication