Hacker News new | past | comments | ask | show | jobs | submit login

Sam and Ilya have recently made public statements about AGI that appear to highlight a fundamental disagreement between them.

Sam claims LLMs aren't sufficient for AGI (rightfully so).

Ilya claims the transformer architecture, with some modification for efficiency, is actually sufficient for AGI.

Obviously transformers are the core component of LLMs today, and the devil is in the details (a future model may resemble the transformers of today, while also being dynamic in terms of training data/experience), but the jury is still out.

In either case, publicly disagreeing on the future direction of OpenAI may be indicative of deeper problems internally.




That is just disagreement in technological aspects, which any well-functioning company should always have a healthy amount of within its leadership. The press release literally said he was fired because he was lying. I haven't seen anything like that from a board of a big company for a very long time.


Additionally, OpenAI can just put resources towards both approaches in order to settle this dispute. The whole point of research is that you don't know the conclusions ahead of time.


Seemingly, OpenAIs priorities shifted after the public ChatGPT release, and seems to be more and more geared towards selling to consumers, rather than a research lab that it seemed they initially were aiming for.

I'm sure this was part of the disagreement as Sam is "capitalism incarnated" while Ilya gives of much different feelings.


Maybe some promise was made by Sam to MS for the funding that the board didn't approve. He may have expected the board to accept the terms he agreed to but they fired him instead.


That might be part of it. They announced that they were dedicating compute to researching superintelligence alignment. When they launched the new stuff on Dev Day, there was not enough compute and the service was disrupted. It may have also interfered with Ilya's team's allocation and stopped their research.

If that happened (speculation) then those resources weren't really dedicated to the research team.


The question has enormous implications for OpenAI because of the specifics of their nonprofit charter. If Altman left out facts to keep the board from deciding they were at the AGI phase of OpenAI, or even to prevent them from doing a fair evaluation, then he absolutely materially misled them and prevented them from doing their jobs.


If it turns out that the ouster was over a difference of opinion re: focusing on open research vs commercial success, then I don't think their current Rube Goldberg corporate structure of a non profit with a for profit subsidiary will survive. They will split up into two separate companies. Once that happens, Microsoft will find someone to sell them a 1.1% ownership stake and then immediately commence a hostile takeover.


Interesting, is this how they're currently structured? It sounds a lot like Mozilla with the Mozilla Foundation and Corporation.


Rightfully so?

No-one knows. But I sure would trust the scientist leading the endeavor more than a business person that has interest in saying the opposite to avoid immediate regulations.


>Ilya claims the transformer architecture, with some modification for efficiency, is actually sufficient for AGI.

I thought this guy was supposed to know what he's talking about? There was a paper that shows LLMs cannot generalise[0]. Anybody who's used ChatGPT can see there's imperfections.

[0] https://arxiv.org/abs/2309.12288


Humans don't work this way either. You don't need the LLM to do the logic, you just need the LLM to prepare the information so it can be fed into a logic engine. Just like humans do when they shut down their system 1 brain and go into system 2 slow mode.

I'm in the definitely ready for AGI camp. But it's not going to be a single model that's going to do the AGI magic trick, it's going to be an engineered system consisting of multiple communicating models hooked up using traditional engineering techniques.


> You don't need the LLM to do the logic, you just need the LLM to prepare the information so it can be fed into a logic engine.

This is my view!

Expert Systems went nowhere, because you have to sit a domain expert down with a knowledge engineer for months, encoding the expertise. And then you get a system that is expert in a specific domain. So if you can get an LLM to distil a corpus (library, or whatever) into a collection of "facts" attributed to specific authors, you could stream those facts into an expert system, that could make deductions, and explain its reasoning.

So I don't think these LLMs lead directly to AGI (or any kind of AI). They are text-retrieval systems, a bit like search engines but cleverer. But used as an input-filter for a reasoning engine such as an expert system, you could end up with a system that starts to approach what I'd call "intelligence".

If someone is trying to develop such a system, I'd like to know.


They should fire Ilya and get you in there


> We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of 'Abyssal Melodies'" and showing that they fail to correctly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation.

This just proves that the LLMs available to them, with the training and augmentation methods they employed, aren't able to generalize. This doesn't prove that it is impossible for future LLMs or novel training and augmentation techniques will be unable to generalize.


No, if you read this article it shows there were some issues with the way they tested.

> The claim that GPT-4 can’t make B to A generalizations is false. And not what the authors were claiming. They were talking about these kinds of generalizations from pre and post training.

> When you divide data into prompt and completion pairs and the completions never reference the prompts or even hint at it, you’ve successfully trained a prompt completion A is B model but not one that will readily go from B is A. LLMs trained on “A is B” fail to learn “B is A” when the training date is split into prompt and completion pairs

Simple fix - put prompt and completion together, don't do gradients just for the completion, but also for the prompt. Or just make sure the model trains on data going in both directions by augmenting it pre-training.

https://andrewmayne.com/2023/11/14/is-the-reversal-curse-rea...


LLMs can't generalise no, but the meta-architecture around them can 100%.

Think about the RLHF component that trains LLMs. It's the training itself that generalises - not the final model that becomes a static component.


[flagged]


There is lack of humility in making such an assertion about a path to AGI.


It is funny believing we have AGI and yet resorting ad-hominem to prove/defend that!

At some point in time Ilya was a nobody going against the gods of AI/ML. Just slightly over a decade ago neural networks were a joke in AI.


>rightfully so

How the hell can people be so confident about this? You describe two smart people reasonably disagreeing about a complicated topic


The LLMs of today are just multidimensional mirrors that contain humanity's knowledge. They don't advance that knowledge, they just regurgitate it, remix it, and expose patterns. We train them. They are very convincing, and show that the Turing test may be flawed.

Given that AGI means reaching "any intellectual task that human beings can perform", we need a system that can go beyond lexical reasoning and actually contribute (on it's own) to advance our total knowledge. Anything less isn't AGI.

Ilya may be right that a super-scaled transformer model (with additional mechanics beyond today's LLMs) will achieve AGI, or he may be wrong.

Therefore something more than an LLM is needed to reach AGI, what that is, we don't yet know!


Prediction: there isn't a difference. The apparent difference is a manifestation of human brain delusion about how human brains work. The Turing test is a beautiful proof of this phenomenon: so and so thing is impossibility hard only achievable via magic capabilities of human brains...oops no actually it's easily achievable now so we better re-define our test. This cycle Will continue until the singularly. Disclosure: I've been long term skeptical about AI but that writing is up on the wall now.


Clearly there's a difference, because the architectures we have don't know how to persist information or further train.

Without persistence outside of the context window, they can't even maintain a dynamic, stable higher level goal.

Whether you can bolt something small to these architectures for persistence and do some small things and get AGI is an open question, but what we have is clearly insufficient by design.

I expect it's something in-between: our current approaches are a fertile ground for improving towards AGI, but it's also not a trivial further step to get there.


But context windows got to 100K now, RAG systems are everywhere, and we can cheaply fine-tune LoRAs for a price similar with inferencing, maybe 3x more expensive per token. A memory hierarchy made of LoRA -> Context -> RAG could be "all you need".

My beef with RAG is that it doesn't match on information that is not explicit in the text, so "the fourth word of this phrase" won't embed like the word "of", or "Bruce Willis' mother's first name" won't match with "Marlene". To fix this issue we need to draw chain-of-thought inferences from the chunks we index in the RAG system.

So my conclusion is that maybe we got the model all right but the data is too messy, we need to improve the data by studying it with the model prior to indexing. That would also fix the memory issues.

Everyone is over focusing on models to the detriment of thinking about the data. But models are just data gradients stacked up, we forget that. All the smarts the model has come from the data. We need data improvement more than model improvement.

Just consider the "Textbook quality data" paper Phi-1.5 and Orca datasets, they show that diverse chain of thought synthetic data is 5x better than organic text.


I've been wondering along similar lines, although I am for all intents and purposes here a layman so apologies if the following is nonsensical.

I feel there are potential parallels between RAG and how human memory works. When we humans are prompted, I suspect we engage in some sort of relevant memory retrieval process and the retrieved memories are packaged up and factored in to our mental processing triggered by the prompt. This seems similar to RAG, where my understanding is that some sort of semantic search is conducted over a database of embeddings (essentially, "relevant memories") and then shoved into the prompt as additional context. Bigger context window allows for more "memories" to contextualise/inform the model's answer.

I've been wondering three things: (1) are previous user prompts and model answers also converted to embeddings and stored in the embedding database, as new "memories", essentially making the model "smarter" as it accumulates more "experiences" (2) could these "memories" be stored alongside a salience score of some kind that increases the chance of retrieval (with the salience score probably some composite of recency and perhaps degree of positive feedback from the original user?) (3) could you take these new "memories" and use them to incrementally retrain the model for, say, 8 hours every night? :)

Edit: And if you did (3), would that mean even with a temperature set at 0 the model might output one response to a prompt today, and a different response to an identical prompt tomorrow, due to the additional "experience" it has accumulated?


> Clearly there's a difference, because the architectures we have don't know how to persist information or further train. Without persistence outside of the context window, they can't even maintain a dynamic, stable higher level goal.

Nope, and not all people can achieve this as well. Would you call them less than humans than? I assume you wouldn't, as it is not only sentience of current events that maketh man. If you disagree, then we simply have fundamental disagreements on what maketh man, thus there is no way we'd have agreed in the first place.


Isn't RAG essentially the "something small you can bolt on" to an LLM that gives it "persistence outside the context window?" There's no reason you can't take the output of an LLM and stuff it into a vector database. And, if you ask it to create a plan to do a thing, it can do that. So, there you have it: goal-oriented persistence outside of the context window.

I don't claim that RAG + LLM = AGI, but I do think it takes you a long way toward goal-oriented, autonomous agents with at least a degree of intelligence.


From my experience there's definitely context beyond the current set of LLM state. It's how they're able to regurgitate facts or speak at all.


> regurgitate facts or speak at all.

Most of that is encoded into weights during training, though external function call interfaces and RAG are broadening this.


> Without persistence outside of the context window, they can't even maintain a dynamic, stable higher level goal.

I mean, can't you say the same for people? We are easily confused and manipulated, for the most part.


I can remember to do something tomorrow after doing many things in-between.

I can reason about something and then combine it with something I reasoned about at a different time.

I can learn new tasks.

I can pick a goal of my own choosing and then still be working towards it intermittently weeks later.

The examples we have now of GPT LLM cannot do these things. Doing those things may be a small change, or may not be tractable for these architectures to do at all... but it's probably in-between: hard but can be "tacked on."


Former neuroscientist here.

Our brain actually uses many different functions for all of these things. Intelligence is incredibly complex.

But also, you don't need all of these to have real intelligence. People can problem solve without memory, since those are different things. People can intelligently problem-solve without a task.

And working towards long-term goals is something we actually take decades to learn. And many fail there as well.

I wouldn't be surprised if, just like in our brain, we'll start adding other modalities that improve memory, planning, etc etc. Seems that they started doing this with the vision update in GPT-4.

I wouldn't be surprised if these LLMs really become the backbone of the AGI. But this is science– You don't really know what'll work until you do it.


> I wouldn't be surprised if these LLMs really become the backbone of the AGI. But this is science– You don't really know what'll work until you do it.

Yes-- this is pretty much what I believe. And there's considerable uncertainty in how close AGI is (and how cheap it will be once it arrives).

It could be tomorrow and cheap. I hope not, because I'm really uncertain if we can deal with it (even if the AI is relatively well aligned).


That just proves we real-time fine tuning of the neuron weights. It is computationally intensive but not fundamentally different. A million token context would look close to long short-term memory and frequent fine-tuning will be akin to long-term memory.

I most probably am anthropomorphizing completely wrong. But point is humans may not be any more creative than an LLM, just that we have better computation and inputs. Maybe creativity is akin to LLMs hallucinations.


Real-time fine tuning would be one approach that probably helps with some things (improving performance at a task based on feedback) but is probably not well suited for others (remembering analogous situations, setting goals; it's not really clear how one fine-tunes a context window into persistence in an LLM). There's also the concern that right now we seem to need many, many more examples in training data than humans get for the machine to get passably good at similar tasks.

I would also say that I believe that long-term goal oriented behavior isn't something that's well represented in the training data. We have stories about it, sometimes, but there's a need to map self-state to these stories to learn anything about what we should do next from them.

I feel like LLMs are much smarter than we are in thinking "per symbol", but we have facilities for iteration and metacognition and saving state that let us have an advantage. I think that we need to find clever, minimal ways to build these "looping" contexts.


> I most probably am anthropomorphizing completely wrong. But point is humans may not be any more creative than an LLM, just that we have better computation and inputs.

I think creativity is made of 2 parts - generating novel ideas, and filtering bad ideas. For the second part we need good feedback. Humans and LLMs are just as good at novel ideation, but humans have the advantage on feedback. We have a body, access to the real world, access to other humans and plenty of tools.

This is not something an android robot couldn't eventually have, and on top of that AIs got the advantage of learning from massive data. They surpass humans when they can leverage it - see AlphaFold, for example.


Are there theoretical models that use real time weights? Every intro to deep learning focuses on stochastic gradient descent for neural network weights; as a layperson I'm curious about what online algorithms would be like instead.


I agree with your premise.

You're right: I haven't seen evidence of LLM novel pattern output that is basically creative.

It can find and remix patterns where there are pre-existing rules and maps that detail where they are and how to use them (ie: grammar, phonics, or an index). But it can't, whatsoever, expose new patterns. At least public facing LLM's can't. They can't abstract.

I think that this is an important distinction when speaking of AI pattern finding, as the language tends to imply AGI behavior.

But abstraction (as perhaps the actual marker of AGI) is so different from what they can do now that it essentially seems to be futurism whose footpath hasn't yet been found let alone traversed.

When they can find novel patterns across prior seemingly unconnected concepts, then they will be onto something. When "AI" begins to see the hidden mirrors so to speak.


If LLMs can copy the symbolic behaviors that let humans generate new knowledge, it'll be there.


> , they just regurgitate it, remix it, and expose patterns

Who cares? Sometimes the remixation of such patterns is what leads to new insights in us humans. It is dumb to think that remixing has no material benefit, especially when it clearly does.


> They are very convincing, and show that the Turing test may be flawed

The only think flawed here is this statement. Are you even familiar with the premise of Turing test?


Maybe "rightfully so" meant "it is totally within Sam's right to claim that LLMs aren't sufficient for AGI"?


Did Ilya give a reason why transformers are theoretically sufficient? I've watched him talk in a CS seminar and he's certainly interesting to listen to.


From the interviews with him that I have seen, Sutskever thinks that language model is a sufficient pretraining task because there is a great deal of reasoning involved in next token prediction. The example he used was that suppose you fed a murder mystery novel to a language model and then prompted it with the phrase "The person who committed the model was: ". The model would unquestionably need to reason in order to come to the right conclusion, but at the same time it is just predicting the next token.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: