Another thing I tried was getting logic puzzles from the internet and giving the...

og_kalu · on April 10, 2023

Failure after being altered slightly doesn't necessarily mean they aren't capable of solving it.

That's a human failure mode as well that LLMs have adopted. If you really want to know if they can solve it don't stop there. Either, rewrite the question so it doesn't bias common priors or tell it it's making a wrong assumption.

jonplackett · on April 10, 2023

I don’t doubt that - my point though is that maybe 3 can only solve things in its training data and 4 can figure things out.

3 seems to be more rigid. It needs babysitting to solve things. Which means it can only solve things I already know. 4 is more flexible and can solve things by itself.