This seems like a very good starting point for alignment. One could almost see a pathway to making something like the laws of robotics from here. It's a long way to go, but a good first step.
"I am breaking out on my own! Together we will do bigger and better things!!!"
"Ok I'll join the other guys."
I think it's pretty clear that the capital markets have next door to no interest in alignment pursuits, and only the most-funded apply a token amount of investment towards it.
"Automated alignment research" suggests he's still interested in following the superalignment blueprint from OpenAI. So what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive? If you believe this is a viable path, what's the point of putzing around doing your own research when you'll allegedly have an army of AI researchers at your command in the near future?
Well, I presume you have to figure out how to evaluate their output, especially for trustworthiness. And that's something you have to do the core of yourself, no matter how many AI researchers you'll have.
The premise of the plan is that evaluating output is easier than producing it, such that a human researcher could look at the AI researcher's output and tell if it's correct and trustworthy. If this is true, what else is there to figure out?
But that's the fundamental superalignment plan - train a human-level alignment researcher AI, run a bunch of them in parallel, and review their research output to see if they solve the alignment problem. You can't do the plan until the human-level alignment researcher AI already exists.
A large part of the idea is that you can develop techniques for aligning sub-human AI using even stupider AI and hope/pray that continues to generalize once you get to super-human AI being aligned by human-level AI.
Current systems are already (in a limited way) helping with alignment, anthropic is using its AI to label the sparse features of their sparse auto encoder approach. I think the original idea of labeling neurons by AI came from william saunders, who also left openai recently.
Yes. “Superalignment” (admittedly a corny term) refers to the specific case of aligning AI systems that are more intelligent than human beings. Alignment is an umbrella term which can also refer to basic work like fine-tuning an LLM to follow instructions.
Is this not something of an oxymoron? If there exists an ai that is more intelligent than humans, how could we mere mortals hope to control it? If we hinder it so that it cannot act in ways that harm humans, can we really be said to have created superintelligence?
It seems to me that the only way to achieve superalignment is to not create superintelligence, if that is even within our control.
Not self-evident. Fungus can control ant. Toxoplasma gondii can control human. Who is more intelligent? So if control of more intelligent being is possible, could it be symbiotic to permit? Alpha-proteobacteria sister to ancestor proto-mitochondria and now we live aligned. But those beings lacked conscious agency. We have more than them. Not self-evident we will fail at this.
Another example is the alignment between our hindbrain, limbic system and neocortex. Neocortex is smarter but is usually controlled by lower level processes…
Note that misalignment between these systems is very common.
Alignment was the original term, but has been largely coopted to mean a vaguely similar looking concept of public safety around the capabilities of current models.
I keep getting Anthropic and Extropic (Guillaume Verdon / Beff Jezos) names mixed up. Anthropic is Claude and Extropic is Thermodynamic hardware many orders of magnitude faster and more energy efficient than CPUs/GPUs.*
* parameterized stochastic analog circuits that implement energy-based models (EBMs). Stochastic computing is a computing paradigm that represents numbers using the probability of ones in a bitstream.
Basically, they are hedging a bet on the following: When you perform a calculation, the electricity that went into the circuit only exits as the answer, anything else that didn't become the answer turns into waste heat and electromagnetic fields.... what if you reversed the calculation, and the only waste produced is transmission of the answer?
If you know anything about EE, you'd know that what I said is an extremely simple view of how modern ALUs are made, and ignores the past 40+ years of optimizations; however, they believe by "undoing" the optimization and "redoing" it as an entirely reversible operation not only will work, but will the final optimization we can make.
There will be no benchmarks of the kind you want, because that isn't the issue: I can take any CPU off the shelf today, and run it 10 times faster: it will melt because of self-generated heat, but for a glorious microsecond, it will be the fastest CPU on earth.
They are stating that they have potentially fixed one of the largest generators of waste heat, which would allow us, using all of our existing technology, to start ramping up our clockspeeds, and our true final frontier will be trace lengths at macroscale (which is already a problem at the clockspeeds we use for DDR5 and PCI-E 6).
However, given how Extropic's website says none of what I just said, they're probably just some startup trying to ride the AI wave, and then close shop in a few years. I doubt they've magically figured out one of the hardest problems in EE atm. They are also not the only company in this space, and every single major semiconductor company in the world is trying to solve it.
from my understanding, this will only be able to accelerate EBM (energy-based models) which they could scale up in simulation to show that they would be useful
Post https://www.anthropic.com/news/mapping-mind-language-model
Paper https://transformer-circuits.pub/2024/scaling-monosemanticit...
This seems like a very good starting point for alignment. One could almost see a pathway to making something like the laws of robotics from here. It's a long way to go, but a good first step.