I think at this point it's an absurd take that they aren't reasoning. I don't th...

Jensson · 2025-08-08T09:33:48 1754645628

> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules

This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.

Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.

GolDDranks · 2025-08-08T10:33:08 1754649188

Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.

Davidzheng · 2025-08-08T10:30:42 1754649042

the harness is obtained from the game rules? the "harness" is part of the algorithm of alphzero

Jensson · 2025-08-08T13:14:14 1754658854

> the "harness" is part of the algorithm of alphzero

Then that is not a general algorithm and results from it doesn't apply to other problems.

andrepd · 2025-08-08T10:09:49 1754647789

If you mean CoT, it's mostly fake https://www.anthropic.com/research/reasoning-models-dont-say...

If you mean symbolic reasoning, well it's pretty obvious that they aren't doing it since they fail basic arithmetic.

diggan · 2025-08-08T10:43:32 1754649812

> If you mean CoT, it's mostly fake

If that's your take-away from that paper, it seems you've arrived at the wrong conclusion. It's not that it's "fake", it's that it doesn't give the full picture, and if you only rely on CoT to catch "undesirable" behavior, you'll miss a lot. There is a lot more nuance than you allude to, from the paper itself:

> These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.

Davidzheng · 2025-08-08T10:32:20 1754649140

very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.

andrepd · 2025-08-08T13:36:24 1754660184

Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.

They can convincingly mimic human thought but the illusion falls flat at further inspection.

jimbo808 · 2025-08-10T01:20:04 1754788804

What? Do you mean like this??? https://www.reddit.com/r/OpenAI/comments/1mkrrbx/chatgpt_5_h...

Calculators have been better than humans at arithmetic for well over half a century. Calculators can reason?

jimbo808 · 2025-08-08T16:40:39 1754671239

It's an absurd take to actually believe they can reason. The cutting edge "reasoning model," by the way:

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226