Would be a shame, because chess is an excellent metric for testing logical thoug... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Workaccount2 38 days ago | parent | context | favorite | on: Something weird is happening with LLMs and Chess

Would be a shame, because chess is an excellent metric for testing logical thought and internal modeling. An LLM that can pick up and unique chess game half way through and play it ideally to completion is clearly doing more than "predicting the next token based on the previous one".

selcuka 38 days ago [–]

> chess is an excellent metric for testing logical thought and internal modeling

Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.

gs17 37 days ago | [–]

People had noticed this exact same discrepancy between 3.5-turbo-instruct and 4 a year ago: https://x.com/GrantSlatton/status/1703913578036904431

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact