Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If it can frame the question for the tool, it therefore has the logic (whether that was static recall or deductive).

LLM's struggle with simple maths by nature of their architecture not due to a lack of logic. Yes it struggles with logic questions too but they're not directly related here.



Most of the failures for theses simple logic question come from the inability to simply copy data accuratly. Logic is too abstract to be measured, but this single bench show something getting in it's way. I got another bench that show that the LLMs do basic mistakes that can be easily avoided with minimum logic and observation.


> LLM's struggle with simple maths by nature of their architecture not due to a lack of logic.

No, if it was good at logic it would have overcame that tiny architectural hurdle, its such a trivial process to convert tokens to numbers that it is ridiculous for you to suggest that is the reason it fails at math.

The reason it fails at math is because it fails at logic, and math is the most direct set of logic we have. It doesn't fail at converting between formats, it can convert strawberry to correct Base64 encoding, meaning it does know exactly what letters are there, it just lacks to logic to actually understand what "count letters" means.


It can't see that data so how can it convert it? It can only see the token input.

An analogy (probably poor) is like asking a human to see UV light. We can do so but only with tools or by removing our lense.

The fact that SOTA models (not yet publicly available) can achieve gold at IOM implies otherwise.


It's because math problems allow to easily check that the solution is correct, it allow to do a lot of 'search': https://yellow-apartment-148.notion.site/AI-Search-The-Bitte...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: