Hacker News new | past | comments | ask | show | jobs | submit login

Solving it with Prolog is neat, and a very realistic way of how LLMs with tools should be expected to handle this kind of thing.



I would've been very surprised if Prolog to solve this wasn't something that the model had already ingested.

Early AI hype cycles, after all, is where Prolog, like Lisp, shone.



I'm certain models like o3-mini are capable of writing Prolog of this quality for puzzles they haven't seen before - it feels like a very straight-forward conversion operation for them.


My comment got eaten by HN, but I think LLMs should be used as the glue between logic systems like prolog, with inductive, deductive and abductive reasoning being handed off to a tool. LLMs are great at pattern matching, but forcing them to reason seems like an out of envelope use.

Prolog would be how I would solve puzzles like that as well. It is like calling someone weak for using a spreadsheet or a calculator.

Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations https://arxiv.org/abs/2305.14618


I actually coincidentally tried this yesterday on variants of the "surgeon can't operate on boy" puzzle. It didn't help, LLMs still can't reliably solve it.

(All current commercial LLMs are badly overfit on this puzzle, so if you try changing parts of it they'll get stuck and try to give the original answer in ways that don't make sense.)


What do you mean by you tried it?


Generated some Prolog programs and looked at them and they were wrong.

Specifically, it usually decides it knows what the answer is (and gets it wrong), then optimizes out the part of the program that does anything.


I've been saying this ever since GPT 3 came out and I started toying with it.

It's unfortunate that for all the people who work in AI most of them barely even know what Prolog is.


It seems quite logical to me as well. An LLM is not a logical computing system but it has the knowledge on how to do a multiplication


I’ve used DeepSeek for verifying a couple gnarly boolean conditions in home assistant with z3 and it did a good job, though it didn’t one shot it.


I used a Knights and Knaves puzzle generator last month to test 4o / Claude 3.5 and all failed on novel puzzles


Hey, I'm interested in the details of this. How many persons in the puzzle? Did it include nested statements, conditionals and such?

If the puzzle generator is hosted anywhere, I'd love to have a look at it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: