This feels quite close to the definition of the singularity; if an LLM can become both the Generator and the Discriminator (to use a GAN analogy), then we have takeoff.
We might eventually see the code for the models and inference, but I do doubt we’ll see training code or be granted access to training data. Google is pretty bad at this.
Great research here. Contextual real-time weight modification is definitely one of the breakthroughs required for AGI. Why create a LoRA when you can generate one on the fly suited to the task at hand?
It does not seem like they are doing inference time weight changes, to the tune of running backprop. It sounds more like they are applying a pre-trained vector to the model, and select that vector based on the input, in a two step process
That’s my general understanding as well, but it isn’t a large conceptual leap to go from real-time selection of pretrained “z-vectors” to real-time generation of the same. The larger conceptual breakthrough, with demonstration of its effectiveness, is the big success here.
While not a large conceptual leap, the real-time generation of "z-vectors" is not cheap in terms of compute or data requirements, the latter of which I see as the main issue. How are you going to generate the vector from a single real-time input?
I still have yet to see anything that dissuades me from agreeing with Yann LeCun when he says Transformers are fundamentally limited. We won't get creativity, reasoning, or even move past hallucinations without a major breakthrough
They do not change it, from what I have seen, o3 is more hype and marketing than a meaningful step towards models which can exhibit real creativity and reasoning as humans perform it (rather than perceive it, which is the root of the hype)
For example, a small child is completely capable of being told "get in the car" and can understand, navigate, open the door, and get in, with incredibly little energy usage (maybe about the amount of a single potato chip/crisp)
Now consider what I have been working on recently (1) evaluating secops tools from both a technical and business perspective (2) prototyping and creating an RFC for the next version of our DX at the org. They are very far from this capability because it involves so many competing incentives, trade offs, and not just the context of the current state of code, but also the history and vision. Crafting that vision is especially beyond what a foundation in transformers can offer. They are in essence an averaging and sequence prediction algorithm
These tools are useful, even provide an ROI, but by no means anywhere close to what I would call intelligent.
Maybe the analogy is something with gold mining. We could pretend that the machines that mine gold are actually creating gold. Pretending the entire gold mining sector is instead a discovery of alchemy.
Maybe the way alchemy kind of leads to chemistry is the analogy that applies?
I don't even know if that is right though.
The intelligence is in the training data. The model then is extracting the intelligence.
We can't forget Feynman's ideas here that we aren't going to make a robot cheetah that runs fast. We will make a machine that uses wheels. Viewing things through the lense of a cheetah is a category error.
While I agree completely with you we very well both might be completely and utterly wrong. A category error on what intelligence "is".
The interesting thing here is that the human brain also seems to use pretrained ... things. For vision, use the visual subsystem. For hearing, use the auditory subsystem. For movement ... you get the point. Plus you can combine these pretrained ... things, so for example for complex movement, like balancing on a tightrope, multiple subsystems are used (try standing on one leg with your eyes closed).
Z-vectors are of course nothing like the subsystems in your brain, but general the approach is certainly similar to how the brain works.
Sort of. According to the text they can use multiple z-vectors (sets of weights that select for parts of the system to be used to answer a specific question) simultaneously, using a "simple optimization algorithm" to determine the relative weight for each of these vectors.
They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:
Why not, as each new task comes up, and then weights are revalued, save those weights and keep them for reference as priors for similar future tasks? As the model is exposed to new data the average of the set of priors of things the model thinks is similar might move closer to the posterior making the model quicker and more able to arrive at good outcomes. I suppose storage might be an issue.
I'm wondering if you could fine tune the model on an aggregate of a temporal slice of revalued weights? Something analogous to REM sleep's involvement in embedding the days events into long term memory.
i thought mixture of experts didn't update itself with new sets of weights and was just a collection of already trained networks/weights? I could be wrong.
I develop sophisticated LLM programs every day at a small YC startup — extracting insights from thousands of documents a day.
These LLM programs are very different than naive one-shot questions asked of ChatGPT, resembling o1/3 thinking that integrates human domain knowledge to produce great answers that would have been cost-prohibitive for humans to do manually.
Naive use of LLMs by non-technical users is annoying, but is also a straw-man argument against the technology. Smart usage of LLMs in o1/3 style of emulated reasoning unlocks entirely new realms of functionality.
LLMs are analogous to a new programming platform, such as iPhones and VR. New platforms unlock new functionality along with various tradeoffs. We need time to explore what makes sense to build on top of this platform, and what things don’t make sense.
What we shouldn’t do is give blanket approval or disapproval. Like any other technology, we should use the right tool for the job and utilize said tool correctly and effectively.
There is nothing to build on top of this AI platform as you call it. AI is nothing but an autocorrect program, AI is not innovating anything anywhere. Surprises me how much even the smartest people are deceived by simple trickery and continue to fall for every illusion.
>Naive use of LLMs by non-technical users is annoying, but is also a straw-man argument against the technology. Smart usage of LLMs in o1/3 style of emulated reasoning unlocks entirely new realms of functionality.
I agree in principle, but disagree in practice. With LLMs available to everyone, the uses we're seeing currently will only proliferate. Is that strictly a technology problem? No, but it's cold comfort given how LLM usage is actually playing out day-to-day. Social media is a useful metaphor here: it could potentially be a strictly useful technology, but in practice it's used to quite deleterious effect.
"There’s no question in my mind that such software could generate reasonably good murder mysteries, action thrillers, or gothic romances. After all, even the authors of such works will tell you that they are formulaic. If there’s a formula in there, a deep learning AI system will figure it out.
Therein lies the fatal flaw: the output will be formulaic. Most important, the output won’t have any artistic content at all. You will NEVER see anything like literature coming out of deep learning AI. You’ll see plenty of potboilers pouring forth, but you can’t make art without an artist.
This stuff will be hailed as the next great revolution in entertainment. We’ll see lots of prizes awarded, fulsome reviews, thick layers of praise heaped on, and nobody will see any need to work on the real thing. That will stop us dead in our tracks for a few decades."
there's only really like seven basic plots; man v man, man v nature, man v self, man v society, man v fate/god, man v technology so we should probably just stop writing stories anyway
It would not surprise me if most people could not tell whether some story about the human condition is human or AI generated. Excluding actual visual artists that have specific context of the craft, most people already can't tell AI art from human art when put to a blind test.
As far as I know know AI art can't really follow instructions so it's actually very, very easy to tell the difference if you aren't biasing the test by allowing vague instructions permitting random results to be considered acceptable.
"Here's a photo of me and my wife, draw me and my wife as a cowboy in the style of a Dilbert cartoon shooting a gun in the air" can't be done by AI as far as I know, which is why artist are still employed throughout the world.
Last time I checked GenAI it wasn't able to handle multiple people, but giving Midjourney a picture of yourself, and asking it to "draw me as a cowboy in the style of a Dilbert cartoon shooting a gun in the air" is totally a thing it will do. Without a picture of you to test on, we can't debate how well the image looks like you, but here's one of Jackie Chan: https://imgur.com/a/6cBrHWd
Are you saying you can upload a picture to mid journey that it will use as a reference?
Jackie Chan is not a good example because he's a famous person it may have been trained on. I used myself as an example because it would be something that is novel to the AI, it would not be able to rely on it's training to draw me, as I am not famous.
IMO this will be the differentiating feature for the next generation of video game consoles (or the one after that, if we’re due for an imminent PS6/Xbox2 refresh). They can afford to design their own custom TPU-style chip in partnership with AMD/Nvidia and put enough memory on it to run the smaller models. Games will ship with their own fine tuned models for their game world, possibly multiple to handle conversation and world building, inflating download sizes even more.
I think fully conversational games (voice to voice) with dynamic story lines are only a decade or two away, pending a minor breakthrough in model distillation techniques or consumer inference hardware. Unlike self driving cars or AGI the technology seems to be there, it’s just so new no one has tried it. It’ll be really interesting to see how game designers and writers will wrangle this technology without compromising fun. They’ll probably have to have a full agentic pipeline with artificial play testers running 24/7 just to figure out the new “bugspace”.
Can’t wait to see what Nintendo does, but that’s probably going to take a decade.
The technology is incredible, but the path to AGI isn't single-player. Qualia is the missing dataset required for AGI. See attention-schema theory for how social pressures lead to qualia-driven minds capable of true intelligence.
“1 + 1 = 2” is only true in our imagination, according to logical deterministic rules we’ve created. But reality is, at its most fundamental level, probabilistic rather than deterministic.
Luckily, our imaginary reality of precision is close enough to the true reality of probability that it enables us to build things like computer chips (i.e., all of modern civilization). And yet, the nature of physics requires error correction for those chips. This problem becomes more obvious when working at the quantum scale, where quantum error correction remains basically unsolved.
I’m just reframing the problem of finding a grand unified theory of physics that encompasses a seemingly deterministic macro with a seemingly probabilistic micro. I say seemingly, because it seems that macro-mysteries like dark matter will have a more elegant and predictive solution once we understand how micro-probabilities create macro-effects. I suspect that the answer will be that one plus one is usually equal to two, but that under odd circumstances, are not. That’s the kind of math that will unlock new frontiers for hacking the nature of our reality.
The only reason Sam would leave OpenAI is if he thought AGI could only be achieved elsewhere, or that AGI was impossible without some other breakthrough in another industry (energy, hardware, etc).
High-intelligence AGI is the last human invention — the holy grail of technology. Nothing could be more ambitious, and if we know anything about Altman, it is that his ambition has no ceiling.
Having said all of that, OpenAI appears to be all in on brute-force AGI and swallowing the bitter lesson that vast and efficient compute is all you need. But they’ve overlooking a massive dataset that all known biological intelligences rely upon: qualia. By definition, qualia exist only within conscious minds. Until we train models on qualia, we’ll be stuck with LLMs that are philosophical zombies — incapable of understanding our world — a world that consists only of qualia.
Building software capable of utilizing qualia requires us to put aside the hard problem of consciousness in favor of mechanical/deterministic theories of consciousness like Attention-Schema Theory (AST). Sure, we don’t understand qualia. We might never understand. But that doesn’t mean we can’t replicate.
> Sure, we don’t understand qualia. We might never understand. But that doesn’t mean we can’t replicate.
I’m pretty sure it means exactly that. Without actually understanding subjective experience, there’s a fundamental doubt akin to the Chinese room. Sweeping that under the carpet and declaring victory doesn’t in fact victory make.
If the universe is material, then we already know with 10-billion percent certainty that some arrangement of matter causes qualia. All we have to do is figure out what arrangements do that.
Ironically, we understand consciousness perfectly. It is literally the only thing we know — conscious experience. We just don’t know, yet, how to replicate it outside of biological reproduction.
I think a better analogy would be vision. Even with a full understanding of the eye and visual cortex, one can only truly understand vision by experiencing sight. If we had to reconstruct sight from scratch, it would be more important to experience sight than to understand the neural structure of sight. It gives us something to aim for.
We basically did that with language and LLMs. Transformers aren’t based on neural structures for language processing. But they do build upon the intuition that the meaning of a sentence consists of the meaning that each word in a sentence has in relation to every other word in a sentence — the attention mechanism. We used our experience of language to construct an architecture.
I think the same is true of qualia and consciousness. We don’t need to know how the hardware works. We just need to know how the software works, and then we can build whatever hardware is necessary to run it. Luckily there’s theories of consciousness out there we can try out, with AST being the best fit I’ve seen so far.
> High-intelligence AGI is the last human invention
Citation?
...or are you just assuming that AGI will be able to solve all of our problems, appropos of nothing but Sam Altman's word? I haven't seen a single credible study suggest that AGI is anything more than a marketing term for vaporware.
Their marketing hyperbole has cheapened much of the language around AI, so naturally it excites someone who writes like the disciple of the techno-prophets
" High-intelligence AGI is the last human invention" What? I could certainly see all kinds of entertaining arguments for this, but to write it so matter of fact was cringe inducing.
It’s true by definition. If we invent a better-than-all-humans inventor, then human invention will give way. It’s a fairly simple idea, and not one I made up.
It’s analogous to the automobile. People do still walk, bike, and ride horses, but the vast majority of productive land transportation is done by automobile. Same thing with electronic communication vs. written correspondence. New tech largely supplants old tech. In this case, the old tech is human ingenuity and inventiveness.
I don’t think this is a controversial take. Many people take issue with the premise that artificial intelligence will surpass human intelligence. I’m just pointing out the logical conclusion of that scenario.
Arguably cars have so many externalities they will bankrupt the earth of cheap energy sources. Walking is at least hundreds of millions of years old, and will endure after the last car stops.
Likewise (silicon based) AGI may be so costly that it exists only for a few years before it's unsustainable no matter the demand for it. Much like Bitcoin, at least in its original incarnation.
Cars never got us anywhere a human couldn't reach by foot. It just commoditized travel and changed our physical environment to better suit cars.
I really don't see any reason to believe "AGI" won't just be retreading the same thoughts humans have already had and documented. There is simply no evidence suggesting it will produce truly novel thought.
I don't know whether it's a controversial take or not, but I can't see how if one day the machine magically wakes up and somehow develops sentience that it follows logically that human intelligence would somehow "give way". I was hoping for a clear explanation of mechanically how such a thing might happen.
Incredible work!