If it can frame the question for the tool, it therefore has the logic (whether t...

Kuinox · 2025-08-16T18:06:40 1755367600

Most of the failures for theses simple logic question come from the inability to simply copy data accuratly. Logic is too abstract to be measured, but this single bench show something getting in it's way. I got another bench that show that the LLMs do basic mistakes that can be easily avoided with minimum logic and observation.

Jensson · 2025-08-16T16:32:38 1755361958

> LLM's struggle with simple maths by nature of their architecture not due to a lack of logic.

No, if it was good at logic it would have overcame that tiny architectural hurdle, its such a trivial process to convert tokens to numbers that it is ridiculous for you to suggest that is the reason it fails at math.

The reason it fails at math is because it fails at logic, and math is the most direct set of logic we have. It doesn't fail at converting between formats, it can convert strawberry to correct Base64 encoding, meaning it does know exactly what letters are there, it just lacks to logic to actually understand what "count letters" means.

KoolKat23 · 2025-08-16T17:52:04 1755366724

It can't see that data so how can it convert it? It can only see the token input.

An analogy (probably poor) is like asking a human to see UV light. We can do so but only with tools or by removing our lense.

The fact that SOTA models (not yet publicly available) can achieve gold at IOM implies otherwise.

Kuinox · 2025-08-16T18:13:35 1755368015

It's because math problems allow to easily check that the solution is correct, it allow to do a lot of 'search': https://yellow-apartment-148.notion.site/AI-Search-The-Bitte...