Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think at this point it's an absurd take that they aren't reasoning. I don't think without reasoning about code (& math) you can get to such high scores on competitive coding and IMO scores.

Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules. Theoretically should also be possible in research math. Less so in programming b/c we care about less rigid things like style. But if you rigorously defined the objective, training data should also be not necessary.



> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules

This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.

Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.


Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.


the harness is obtained from the game rules? the "harness" is part of the algorithm of alphzero


> the "harness" is part of the algorithm of alphzero

Then that is not a general algorithm and results from it doesn't apply to other problems.


If you mean CoT, it's mostly fake https://www.anthropic.com/research/reasoning-models-dont-say...

If you mean symbolic reasoning, well it's pretty obvious that they aren't doing it since they fail basic arithmetic.


> If you mean CoT, it's mostly fake

If that's your take-away from that paper, it seems you've arrived at the wrong conclusion. It's not that it's "fake", it's that it doesn't give the full picture, and if you only rely on CoT to catch "undesirable" behavior, you'll miss a lot. There is a lot more nuance than you allude to, from the paper itself:

> These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.


very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.


Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.

They can convincingly mimic human thought but the illusion falls flat at further inspection.


What? Do you mean like this??? https://www.reddit.com/r/OpenAI/comments/1mkrrbx/chatgpt_5_h...

Calculators have been better than humans at arithmetic for well over half a century. Calculators can reason?


It's an absurd take to actually believe they can reason. The cutting edge "reasoning model," by the way:

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: