Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
selcuka
54 days ago
|
parent
|
context
|
favorite
| on:
Something weird is happening with LLMs and Chess
> chess is an excellent metric for testing logical thought and internal modeling
Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.
gs17
54 days ago
[–]
People had noticed this exact same discrepancy between 3.5-turbo-instruct and 4 a year ago:
https://x.com/GrantSlatton/status/1703913578036904431
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.