Solving it with Prolog is neat, and a very realistic way of how LLMs with tools ...

EdwardDiego · 2025-02-02T05:45:00 1738475100

I would've been very surprised if Prolog to solve this wasn't something that the model had already ingested.

Early AI hype cycles, after all, is where Prolog, like Lisp, shone.

plasticeagle · 2025-02-02T06:43:14 1738478594

Indeed.

https://stackoverflow.com/questions/9252656/einsteins-riddle...

simonw · 2025-02-02T06:45:24 1738478724

I'm certain models like o3-mini are capable of writing Prolog of this quality for puzzles they haven't seen before - it feels like a very straight-forward conversion operation for them.

sitkack · 2025-02-02T07:41:30 1738482090

My comment got eaten by HN, but I think LLMs should be used as the glue between logic systems like prolog, with inductive, deductive and abductive reasoning being handed off to a tool. LLMs are great at pattern matching, but forcing them to reason seems like an out of envelope use.

Prolog would be how I would solve puzzles like that as well. It is like calling someone weak for using a spreadsheet or a calculator.

Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations https://arxiv.org/abs/2305.14618

astrange · 2025-02-03T12:55:22 1738587322

I actually coincidentally tried this yesterday on variants of the "surgeon can't operate on boy" puzzle. It didn't help, LLMs still can't reliably solve it.

(All current commercial LLMs are badly overfit on this puzzle, so if you try changing parts of it they'll get stuck and try to give the original answer in ways that don't make sense.)

sitkack · 2025-02-03T15:00:02 1738594802

What do you mean by you tried it?

astrange · 2025-02-03T21:26:19 1738617979

Generated some Prolog programs and looked at them and they were wrong.

Specifically, it usually decides it knows what the answer is (and gets it wrong), then optimizes out the part of the program that does anything.

unification_fan · 2025-02-02T12:23:56 1738499036

I've been saying this ever since GPT 3 came out and I started toying with it.

It's unfortunate that for all the people who work in AI most of them barely even know what Prolog is.

nithril · 2025-02-02T10:03:34 1738490614

It seems quite logical to me as well. An LLM is not a logical computing system but it has the knowledge on how to do a multiplication

baq · 2025-02-02T08:05:43 1738483543

I’ve used DeepSeek for verifying a couple gnarly boolean conditions in home assistant with z3 and it did a good job, though it didn’t one shot it.

TypingOutBugs · 2025-02-02T11:05:08 1738494308

I used a Knights and Knaves puzzle generator last month to test 4o / Claude 3.5 and all failed on novel puzzles

optimalsolver · 2025-02-04T12:19:58 1738671598

Hey, I'm interested in the details of this. How many persons in the puzzle? Did it include nested statements, conditionals and such?

If the puzzle generator is hosted anywhere, I'd love to have a look at it.