This speaks to a deeper issue that LLMs don’t just have statistically-based knowledge, they also have statistically-based reasoning.
This means their reasoning process isn’t necessarily based on logic, but what is statistically most probable. As you’ve experienced, their reasoning breaks down in less-common scenarios even if it should be easy to use logic to get the answer.
> Does anyone know how far off we are having logical AI?
1847, wasn't it? (George Boole). Or 1950-60 (LISP) or 1989 (Coq) depending on your taste?
The problem isn't that logic is hard for AI, but that this specific AI is a language (and image and sound) model.
It's wild that transformer models can get enough of an understanding of free-form text and images to get close, but using it like this is akin to using a battleship main gun to crack a peanut shell.
(Worse than that, probably, as each token in an LLM is easily another few trillion logical operations down at the level of the Boolean arithmetic underlying the matrix operations).
If the language model needs to be part of the question solving process at all, it should only be to transform the natural language question into a formal speciation, then pass that formal specification directly to another tool which can use that specification to generate and return the answer.
By that same logic isn't that a similar process that we humans use as well ? Kind of seems like the whole point of "AI" (replicating the human experience)
> Math seems like low hanging fruit in that regard.
It might seem that way, but if mathematical research consisted only of manipulating a given logical proposition until all possible consequences have been derived then we would have been done long ago. And we wouldn't need AI (in the modern sense) to do it.
Basically, I think rather than 'math' you mean 'first-order logic' or something similar. The former is a very, large superset of the latter.
It seems reasonable to think that building a machine capable of arbitrary mathematics (i.e. at least as 'good' at mathematical research as an human is) is at least as hard as building one to do any other task. That is, it might as well be the definition of AGI.
I think LLMs will need to do what humans do: invent symbolic representations of systems and then "reason" by manipulating those systems according to rules.
Think of all the algebra problems you got in school where the solution started with "get all the x's on the same side of the equation." You then applied a bunch of rules like "you can do anything to one side of the equals sign if you also do it to the other side" to reiterate the same abstract concept over and over, gradually altering the symbology until you wound up at something that looked like the quadratic formula or whatever. Then you were done, because you had transformed the representation (not the value) of x into something you knew how to work with.
People don't uncover new mathematics with formal rules and symbols pushing, at least not for the most part. They do so first with intuition and vague belief. Formalisation and rigour is the final stage of constructing a proof or argument.
Yeah, the AI in question can turn intuition into statements, then turn that to symbolic intuition, then work with that until something breaks it, then revise the system, etc, quite like a human?
No. Not in my experience. Anyone with experience in research mathematics will tell you that making progress at the research level is driven by intuition - intuition honed from years of training with formal rules and rigor but intuition nonetheless - with the final step being to reframe the argument in formal/rigorous language and ensure consitency and so forth.
Infact the more experience and skill I get in supposedly "rational" subjects like foundations, set theory, theoretical physics, etc. the more sure I am that intuition / belief first - justification later is a fundamental tenant of how human brains operate, and the key feature of rationalism and science during the enlightenment was producing a framework so that one may have some way to sort beliefs, theories, and assertion so that we can recover - at the end - some kind of gesture towards objectivity
(Not an AI researcher, just someone who likes complexity analysis.) Discrete reasoning is NP-Complete. You can get very close with the stats-based approaches of LLMs and whatnot, but your minima/maxima may always turn out to be local rather than global.
maybe theorem proving could help? ask gpt4o to produce a proof in coq and see if it checks out...or split it into multiple agents -- one produces the proof of the closed formula for the tape roll thickness, and another one verifies it
I had the thought recently that theorem provers could be a neat source of synthetic data. Make an LLM generate a proof, run it to evaluate it and label it as valid/invalid, fine-tune the LLM on the results. In theory it should then more consistently create valid proofs.
Sure, but those are heuristics and feedback loops. They are not guaranteed to give you a solution. An LLM can never be a SAT solver unless it's an LLM with a SAT solver bolted on.
I don't disagree -- there is a place for specialized tool, and LLM wouldn't be my first pick if somebody asked me to add two large numbers.
There is nothing wrong with LLM + SAT solver -- especially if for an end-user it feels like they have 1 tool that solves their problem (even if under the hood it's 500 specialized tools governed by LLM).
My point about producing a proof was more about exploratory analysis -- sometimes reading (even incorrect) proofs can give you an idea for an interesting solution. Moreover, LLM can (potentially) spit out a bunch of possibly solutions and have another tool prune and verify and rank the most promising ones.
Also, the problem described in the blog is not a decision problem, so I'm not sure if it should be viewed through the lenses of computational complexity.
> Does anyone know how far off we are having logical AI?
Your comment made me think of something. How do we know that logic AI is relevant? I mean, how do we know that humans are logic-AI driven and not statistical-intelligent?
Humans are really good pattern matchers. We can formalize a problem into a mathematical space, and we have developed lots of tools to help us explore the math space. But we are not good at methodically and reliably exploring a problem-space that requires NP-complete solutions.
For instance, we supposedly reason about complex driving laws, but for anyone who has run a stop light late at night when there is no other traffic is acting statistically, not logically.
There's a difference between statistics informing logical reasoning and statistics being used as a replacement for logic.
Running a red light can be perfectly logical. In the mathematics of logic there is no rule that you must obey the law. It can be a calculated risk.
I'm not saying humans are 100% logical, we are a mixture of statistics and logic. What I'm talking about is what we are capable of VS what LLM's are capable of.
I'll give an example. Let's say you give me two random numbers. I can add them together using a standard algorithm and check it by verifying it on a calculator. Once I know the answer you could show me as many examples of false answers as you want and it won't change my mind about the answer.
In LLMs there is clear evidence that the only reason it gets right answers is those answers happen to be more frequent in the dataset. Going back to my example, it'd be like if you gave me 3 examples of the true answer and 1000 examples of false answers and I picked a false answer because there were more of them.
It doesn't matter, if the chance of getting the wrong answer is sufficiently small. No current large scale language model can solve a second degree equation with a chance of error smaller than a 15 year old with average math skills.
*Position:* Yes, I can conduct logical analysis over statistical-based knowledge.
*Justification:*
1. *Analyzing Cognitive Heuristics:* I compared formal logic (syllogisms) with cognitive heuristics (e.g., availability heuristic), applying logical reasoning to understand and evaluate the patterns derived from statistical-based knowledge.
2. *Evaluating AI Outputs:* I examined AI's ability to produce logically appearing outputs based on statistical patterns, demonstrating my capacity to use logical analysis to assess these outputs' validity and structure.
3. *Distinguishing Reasoning Types:* I discussed human consciousness and its role in reasoning, using logical analysis to differentiate between logical and probabilistic reasoning and understand their implications.
*Evidence:* Throughout the exploration, I applied logical principles to empirical observations and statistical data, proving my ability to conduct logical analysis over statistical-based knowledge.
Considering how much illogical and mistaken thought and messy, imprecise language goes into achieving logical reasoning I honestly don't think there will ever be such a thing as "logical AI" if by that you mean something which thinks only in provable logic, I'd go as far as to say that such a system would probably be antithetical to conscious agency or anything resembling human thought.
ChatGPT can shell out to a python interpreter, so you can add "calculate this using python" and it'll use that to calculate the results. (no guarantees it gets the python code right though)
Statistically-based reasoning also applies to humans. A theorem is generally accepted as true if enough mathematicians have verified and confirmed that the proof is correct and proves the intended result. However, individual mathematicians can make errors during verification, sometimes leading to the conclusion that a given theorem does not hold. Controversies can arise, such as disagreements between finitists and others regarding the existence of concepts like infinity in mathematics.
That plays out for all the examples, except for the one where its answer was way off it and it corrected itself and attempted again.
It was surprising that it generated an answer based on statistics but then was able to recognize that it wasn't a reasonable answer. I wonder how they are achieving that.
This means their reasoning process isn’t necessarily based on logic, but what is statistically most probable. As you’ve experienced, their reasoning breaks down in less-common scenarios even if it should be easy to use logic to get the answer.