Hacker News new | past | comments | ask | show | jobs | submit login

That seems the most likely scenario to me.

Helping that along is that it's an obvious scenario to optimize, for all kinds of reasons. One of them being that it is a fairly good "middle of the road" test for integrating with such systems; not as trivial as "Let's feed '1 + 1' to a calculator" and nowhere near as complicated as "let's simulate an entire web page and pretend to click on a thing" or something.




Why would they only incorporate a chess engine into (seemingly) exactly one very old, dated model? The author tests o1-mini and gpt-4o. They both fail at chess.


Because they decided it wasn't worth the effort. I can point to any number of similar situations over the many years I've been working on things. Bullet-point features that aren't pulling their weight or are no longer attracting the hype often don't transition upgrades.

A common myth that people have is that these companies have so much money they can do everything, and then they're mystified by things like bugs in Apple or Microsoft projects that survive for years. But from any given codebase, the space of "things we could do next" is exponential. That defeats any amount of money. If they're considering porting their bespoke chess engine code up to the next model, which absolutely requires non-trivial testing and may require non-trivial work, even for the richest companies in the world it is still an opportunity cost and they may not choose to spend their time there.

I'm not saying this is the situation for sure; I'm saying that this explanation is sufficient that I'm not going "oh my gosh this situation just isn't possible". It's definitely completely possible and believable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: