Hacker News new | past | comments | ask | show | jobs | submit login

> Probably not Transformers specially, but LLMs show that intelligence is remarkably easy.

LLMs show that language is remarkably easy. Ever since GPT-3 was released, I've been convinced that language comprehension isn't nearly as big a component of general intelligence as people are making it out to be. This makes some intuitive sense: I recall a writer for a tabloid expressing that they simply turn off their brain and start spinning up paragraphs.

But so far, I haven't seen any of these models perform logical reasoning, beyond basic memorization and reasoning by analogy. They can tell you all day what their "reasoning process" is, but the actual content of any step is simply something that looks like it would fit in that step. Where do you derive this confidence that advanced logical reasoning is a natural capability of transformer models? (Being capable of emulating finite Turing machines is hardly impressive: any sufficiently large finite circuit can do that.)




>Ever since GPT-3 was released, I've been convinced that language comprehension isn't nearly as big a component of general intelligence as people are making it out to be

"X is the key to intelligence"

computers do X

"Well actually, X isn't that hard..."

rinse and repeat 100x

At some point you have to stop and reflect on whether your concept of intelligence is faulty. All the milestones that came and went (arithmetic, simulations, chess, image recognition, language, etc) are all facets of intelligence. It's not that we're discovering intelligence isn't this or that computational feat, but that intelligence is just made up of many computational feats. Eventually we will have them all covered, much sooner than the naysayers think.


> All the milestones that came and went (arithmetic, simulations, chess, image recognition, language, etc) are all facets of intelligence.

Why should I have to care about those weird milestones that some other randos came up with once upon a time? I've never espoused any of those myself, so how is this supposed to prove anything about my thought process?

> It's not that we're discovering intelligence isn't this or that computational feat, but that intelligence is just made up of many computational feats. Eventually we will have them all covered, much sooner than the naysayers think.

Well, it certainly appears to me like there's a big qualitative difference between the capabilities you mentioned (arithmetic and simulations are just applications of predefined algorithms; chess, image recognition, and language are memorization, association, and analogy on a massive scale) and the kind of ad-hoc multi-step logical reasoning that I'd expect from any AGI. You can argue that the difference is purely illusory, but I'll have a very hard time believing that until I see it with my own eyes.


>so how is this supposed to prove anything about my thought process?

Because its the same thought process that animated theorists of the past. Unless you have some novel argument to demonstrate why language isn't a feature of intelligence despite wide acceptance pre-LLMs, the claim can be dismissed as an instance of this pernicious pattern. Just because computers can do it and it isn't incomprehensibly complex, doesn't mean it's not a feature of intelligence.

>Well, it certainly appears to me like there's a big qualitative difference between the capabilities you mentioned... and the kind of ad-hoc multi-step logical reasoning that I'd expect from any AGI.

I don't know what "qualitative" means here, but I agree there is a difference in kind of computation. But I expect multistep reasoning to just be variations of the kinds of computations we already know how to do. Multistep reasoning is a kind of search problem over semantic space. LLM's handle mapping the semantic space, and our knowledge from solving games can inform a kind of heuristic search. Multistep reasoning will fall to a meta-computational search through semantic space. ChatGPT can already do passable multistep reasoning when guided by the user. An architecture with a meta-computational control mechanism can learn to do this through self-supervision. The current limitations of LLMs are not due to fundamental limits of Transformers, but rather are architectural, as in the kinds of information flow paths that are allowed. In fact, I will be so bold as to say that such a meta-computational architecture will be conscious.


I think that's more representative of tabloid writers than anything, haha. Understanding text is difficult, and scales with g. GPT-3 can make us believe that it can comprehend text that falls in the median of internet content, and I guess there would have to be some edge cases addressed by the devs, but it can't convince humans that is understands more difficult content, or even content that isn't in its db.


I totally agree with your comments on language. I was stretching it to cover "intelligence" too, what I should have said is "many components of intelligence". It really isn't one thing. But I think analogical reasoning is one of the most important, maybe the most important component! I'm not alone. [1]

> Where do you derive this confidence that advanced logical reasoning is a natural capability of transformer models?

("Advanced logical reasoning" is asking a lot, more than I wanted to claim.) I was going off papers like [2] which showed very high accuracy for multi-hop reasoning by fine tuning RoBERTa-large on a synthetic dataset, including for more hops than seen in training (although experiments "suggests that our results are not specific to RoBERTa or transformers, although transformers learn the tasks more easily"). While [3] found "that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems of substantially increased difficulty" but "transformer models’ limited scale-invariance suggests they are far from learning robust deductive reasoning algorithms". I think that low scalability is to be expected, transformers don't have a working memory on which they can iterate learnt algorithmic steps, only a fixed number of steps can be learnt (as I was saying).

Unfortunately, looking for other papers, I found [4] which pours a lot of cold water on [2], saying "a deeper analysis reveals that they appear to overfit to superficial patterns in the data rather than acquiring the logical principles governing the reasoning in these fragments". I suppose you were more correct. I still think there's more than just memorisation happening here, and it isn't necessarily dissimilar to intuitive (rapid) 'reasoning' in humans, but as with everything in LLMs, everything is muddied because capability seems to be a continuum.

[1] Hofstadter, 2001, Analogy as the core of cognition, http://worrydream.com/refs/Hofstadter%20-%20Analogy%20as%20t...

[2] AI2, 2020, RuleTaker: Transformers as Soft Reasoners over Language, https://allenai.org/data/ruletaker

[3] Richardson &al. 2021, Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability https://arxiv.org/abs/2112.09054

[4] Schlegel &al. 2022, Can Transformers Reason in Fragments of Natural Language? https://arxiv.org/abs/2211.05417




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: