Hacker News new | past | comments | ask | show | jobs | submit login
Let's try to understand AI monosemanticity (astralcodexten.com)
350 points by bananaflag on Nov 27, 2023 | hide | past | favorite | 179 comments



There's actually a somewhat reasonable analogy to human cognitive processes here, I think, in the sense that humans tend to form concepts defined by their connectivity to other concepts (c.f. Ferdinand de Saussure & structuralism).

Human brains are also a "black box" in the sense that you can't scan/dissect one to build a concept graph.

Neural nets do seem to have some sort of emergent structural concept graph, in the case of LLMs it's largely informed by human language (because that's what they're trained on.) To an extent, we can observe this empirically through their output even if the first principles are opaque.


> Neural nets do seem to have some sort of emergent structural concept graph, in the case of LLMs it's largely informed by human language (because that's what they're trained on.) To an extent, we can observe this empirically through their output even if the first principles are opaque.

Alternatively, what you're seeing are the structures inherent within human culture as manifested through its literature[1], with LLMs simply being a new and useful tool which makes these structures more apparent.

[1] And also its engineers' training choices


Alternatively, both are the same thing. The structures embedded in human culture, which the literature captures, are themselves manifestations of structures of our understanding on the world.


I meant to address the notion that LLMs manifest emergent structures, perhaps derived from but qualitatively distinct and independent from the indisputably human intelligence-centered structures.

Whether and to what extent human culture itself has emergent properties, including some sort of emergent intelligence, is a different question; indeed, I think a more interesting question than that of LLMs, though the dispute over the behavior and nature of LLMs certainly suggests these questions generally, and begs the question of what we mean wrt emergence, intelligence, etc. (My understanding of emergence comes from reading the seminal book, Complexity: The Emerging Science at the Edge of Order and Chaos, and some related literature. As far as I understand emergence, it's not really a close question of whether LLMs exhibit emergence.)


i buy this. cf the “lossy jpeg of the web” analogy; sure there’s structure: we put it there!


> lossy jpeg of web

Knowledge is compression https://en.wikipedia.org/wiki/Hutter_Prize


“if you can compress it this much it’s gotta be agi!” “no, we just made gzip better. again.”


My comment wasn’t about agi, but out of curiosity what definition are you using for it?

To be fair, I don’t see a categorical difference between human intelligence and compression either


Then why wouldn't any LLM trained on real data have structure?

I mean we don't live in the chaos realm, atoms make molecules, those molecules make structures, those structures make cells, organs, lifeforms.


the degrees to which the (non)deterministic and (non)atomistic models of the universe have predictive power aside —

yes, sure: you should expect a well trained NN to have an internal structure that approximates that of the statistical distribution of encoded features of the training data into the latent space of the model.

that’s very nearly the definition of “trained”.


This gets at the fundamental issue I have with AI: We're trying to get machines to think, but we only have the barest understanding of how we, as humans, think. The concept of a "neuron" that can activate other neurons comes straight from real neurology, but given how complex the human brain is, it's no surprise that we can only create something that is fundamentally "lesser."

However, I think that real neurology and machine-learning can be mutually reinforcing fields: structures discovered in the one could be applied to the other, and vice versa. But thinking we can create "AGI" without first increasing our understanding of "wet" neural nets is the height of hubris.


All the efforts to get computers to play chess mostly like humans do were made to look insufficient by a large enough neural network that took zero knowledge of human position evaluation, which just played against itself an outrageous number of games.

The plane is in many ways lesser than birds, and we barely understood aerodynamics when the wright brothers gave us a working plane: While the bird is efficient, we had a whole lot more thrust. Our planes are far better than they were back then, but not because we understand bird better, but because we focused on efficiency of the simpler designs we could build. The Formula 1 car doesn't come from deep understanding of the efficient, graceful running movements of the cheetah.

Our results in medicine have been far and ahead of our understanding of biochemistry, DNA, and in general, how the human body works. The discovery of penicilin didn't require a lot of understanding: It required luck. Ozempic and Viagra didn't come from deep understanding of the human body: Resarchers were looking for one thing, and ended up with something useful that was straight out unintended.

We might be able to build AGI with what we know of brains, we might not. But either way, it's not because we need to understand the human mind better: Do we have enough compute for a very crude systems that we know how to build to overcome our relative lack of understanding?

So if you ask me, what is the height of hubris is to think that working engineering solutions have to come from deep understanding of how nature solves the problem. The only reasonable bet, given the massive improvements in AI in the last decade, is to assume that the error bars of any prediction here are just huge. Will we get stuck, the way self-driving cars seem to have gotten stuck for a bit? Possibly. Will be go from a stochastic parrot into something better than humans, the same way that Go AIs went from pretty weak to better than any human in the blink of an eye? I'd not discount it either. I expect to be surprised either way.


In AI research there are different branches with different goals and motivations. You have applied research, where the goal is to engineer systems that are better at soving certain needs. You also have scientific endeavours, usually interdisciplinary, typically crossed with biology, psychology and philosophy, where AI models are created and used as models of understanding human or animal intelligence.

The former is where as you describe where better understanding of biology might help but is not a prerequisite for progress, but in the latter it is not just needed but the goal.

Now I know this is a bit of a caricature as both of these disciplines are in practice poorly deliniated and often intermixed. It's easy to find hubristic examples to mock on both sides, but there's value and brilliance to be found in each respectively as well.


> All the efforts to get computers to play chess

You say Go later so I think you know this and it was just a thinko, but chess fell easily to pre-neural techniques that are fully logic based and understandable. It was Go that required a big neural network trained through self play.


I think what they meant was that even logic-based superhuman chess was overtaken by AlphaZero.


> height of hubris is to think that working engineering solutions have to come from deep understanding of how nature solves the problem.

It is basically understood how "nature solves the problem". It does that by evolution. And what is machine learning and neural networks but artificial evolution?


> It is basically understood how "nature solves the problem". It does that by evolution.

Also known as throwing shit at a wall to see what sticks. Evolution is dumb, but it brute-forces its way down the optimization gradients by sheer fucking scale of having turned the entire surface of this planet into tiny, imperfectly self-replicating machines.


Nothing of the sort -- it's best characterized as learning, like it says in the name.


You only save the changes that improve the loss values no? Those are the survivors whose progeny go on to the next iteration.


Something like a GAN is closer to what we'd consider evolution.


AGI and human intelligence don't necessarily have to work the same way. While many of us assume that they must be very similar, I don't think there's any basis for that assumption.


That's correct that they don't need to work the same way. I would say most people that have earnestly thought about it assume they won't.

But given the exponentially greater chemical, biological, and neurological complexity of human brains, the millions of years of evolutionary "pre-training" that they get imbued with, and the years of constant multimodal sensory input required for their culmination in what's called human intelligence, it takes an extremely bold assumption to believe that equivalent ends can be met with a stream of fmadds driving through in an array of transistors.

It's hard to overstate how big a gulf there is in both the hardware and software between living systems and what we currently see in our silicon projects. Neural network learning in general and LLM's in particular are like crude paper airplanes compared to the flight capabilities of a dextrous bird. You can kinda squint and see that they both move some distance through the air, and even marvel at it for a bit, but there's a long long long long way to go.


And yet no bird ever broke the sound barrier. Humans make things that exceed anything in nature. Thought will only be another example.


Exceed in some regards but not others right? Like, let’s not imagine the plane is better at being a bird than the bird. It’s an evocative set of associations though. Makes me think we’ll end up creating the Concorde of intelligence but be no closer to understanding how our own works…


> Exceed in some regards but not others right?

Yeah, nature does it bottom-up through evolution of self-assembling, self-replicating machines. A lot of "design constraints" that go into living things has to do with... keeping them alive. Compare an electrical wire or silicon trace with a biological nerve or neuron - the former are just simple traces of inert metal, the latter are complex nanomachines mostly dedicated to self-maintenance, and incidentally conducting electricity.

Point being, we can't compete with nature across every dimension simultaneously, but we don't need to, and we shouldn't, because we don't need most of the features of natural solution, not at this point. We don't need Concordes hatching from eggs - we have factories that build them.


I agree that creating something akin to an organic being exceeds capabilities at present. But to be fair we never tried to build a bird, instead we tried to build a machine to fly and we excelled.

However I concur that we may end up creating an AI which thinks nothing like us, has no sense of morality, and yet far exceeds us in intelligence. And that would be somewhat unsettling.


Without realizing it you're voicing the concerns of the AI doomers.

We may build something that far exceeds us in capability, but without us understanding it, or it understanding us. This is the alignment issue.


I do wonder if such a thing can actually exist in the realm of intelligence like in the realm of, say, flight. It’s not clear that’s the case.


It'll explain it to us, at least to the best of our capability, with lots of hand-holding.


Or it won't and you'll wonder why your molecular constituents are being siphoned off to build things it finds more useful.


If it sticks my brain in a simulation to keep me occupied, I'm totally ok with this.


Reminder: a tamagotchi Workaccount2 is not you. A parallel instance of you, no matter how perfect, cannot add to your qualitative experiences or make you less dead.


This basis is our limited data set: we have only 1 example of human level intelligence, so it's natural to assume there's something unique, or at least significant about the way it works.


> We're trying to get machines to think, but we only have the barest understanding of how we, as humans, think... But thinking we can create "AGI" without first increasing our understanding of "wet" neural nets is the height of hubris.

That depends on the landscape of the solution space. If the techniques that work much better than anything else happen to be the same techniques used within our own brains, then we might find them, through aggressive experimentation and exploration, before we even realize that it's also how our own brains work too.


We do in fact know that whatever makes the brains tick, it must be something 1) simple enough for the extremely dumb, brute-force optimizer that is evolution to stumble on it, 2) incrementally improvable by the same mechanism, and 3) conferring some kind of advantage at every step. Those are requirements for some feature to evolve.


3 is not required. Multiple steps with no negatives is sufficient.


> thinking we can create "AGI" without first increasing our understanding of "wet" neural nets is the height of hubris

Already ~40 years ago I read in a popular science book that scientists had tried to apply genetic algorithms to finding a better nozzle for a given purpose (jet engine? I don't remember that). That was then done by prefabricating a number of ring-shaped metal parts the inside of which had different widths and different slopes; a stack of these could form a wide variety of shapes for the conduit. They started with an initial configuration, did a measurement, then threw the dice to determine what part should be changed; if the new measurement was better (according to their metric) than the preceding one, they kept it, otherwise reversed it. They did find a nozzle shape that was not only significantly better than any known configuration; it was also significantly weirder, seemingly completely ad-hoc.

There's no computers in this experiment, no intelligent design; people were only necessitated to procure the initial parts and setup, do the dice-throwing and so on; the design process proper was a very basic, randomized, mechanical procedure, no thinking required, no understanding necessary.


I see no reason why this is a problem. We built analogs of many things before we fully understood how they work. Most of human progress has worked this way through an empirical approach. Makes sense since the real world is the ultimate sandbox and the final judge of an idea’s applicability.


Hard disagre. Imho humans suffer under a delusion that their thinking is like a computer whereas it is actually much more like an LLM. So if you ask humans[1] to figure our how they think, the answer will be a) long time coming and b) wrong.

[1] possibly bhudist monks excepted.


The idea of a smaller NN outperforming what you might think it could do by simulating a larger one reminds me of something I read about Portia spiders (going on a deep dive into them after reading the excellent 'Children of Time' by Adrian Tchaikovsky). The idea is that they're able to do things with their handful - on the order of tens of thousands - of neurons that you'd think would require 4 or 5 orders of magnitude more, by basically time-sharing them; do some computation, store it somehow, then reuse the same neurons in a totally different way.


I just skimmed through it for now, but it has seemed kinda natural to me for a few months now that there would be a deep connection between neural networks and differential or algebraic geometry.

Each ReLU layer is just a (quasi-)linear transformation, and a pass through two layers is basically also a linear transformation. If you say you want some piece of information to stay (numerically) intact as it passes through the network, you say you want that piece of information to be processed in the same way in each layer. The groups of linear transformations that "all process information in the same way, and their compositions do, as well" are basically the Lie groups. Anyone else ever had this thought?

I imagine if nothing catastrophic happens we'll have a really beautiful theory of all this someday, which I won't create, but maybe I'll be able to understand it after a lot of hard work.


You might be interested in this workshop: https://www.neurreps.org/

And a possibly relevant paper from it:

https://openreview.net/forum?id=Ag8HcNFfDsg


ReLU is quite far from linear, adding ReLU activations to a linear layer amounts to fitting a piecewise-segmented model of the underlying data.


Well, at all but a finite number of points (specifically all but one point), there is a neighborhood of that point at which ReLU matches a linear function...

In one sense, that seems rather close to being linear. If you take a random point (according to a continuous probability distribution) , then with probability 1, if look in a small enough neighborhood of the selected point, it will be indistinguishable from linear within that neighborhood.

And, for a network made of ReLU gates and affine maps, still get that it looks indistinguishable from affine on any small enough region around any point outside of a set of measure zero.

So... Depends what we mean by “almost linear” I think. I think one can make a reasonable case for saying that, in a sense it is “almost linear”.

But yes, of course I agree that in another important sense, it is far from linear. (E.g. it is not well approximated by any linear function)


Yeah, and we have more than measure zero -- the subsets of the input space on which a fully ReLU MLP is linear are Boolean combinations of hyperspaces. I was coming at it from the heuristic that if you can triangulate a space into a finite number of easily computable convex sets such that the inside of each one has some trait, then it's as good as saying that the space has this trait. But of course this heuristic doesn't always have to be true, or useful.


Everything is something. Question is what this nomenclature gymnastics buys you? Unless you answer that this is no different than claiming neural networks are a projection of my soul


Could looking at NN through the lens of group theory unlock a lot of performance improvements?

If they have inner symmetries we are not aware of, you can avoid waste in searching in the wrong directions.

If you know that some concepts are necessarily independent, you can exploit that in your encoding to avoid superposition.

For example, I am using cyclic groups and dihedral groups, and prime powers to encode representations of what I know to be independent concepts in a NN for a small personal project.

I am working on a 32-bit (perhaps float) representation of mixtures of quantized Von Mises distributions (time of day patterns). I know there are enough bits to represent what I want, but I also want specific algebraic properties so that they will act as a probabilistic sketch: an accumulator or a Monad if you like.

I don't know the exact formula for this probabilistic sketch operator, but I am positive it should exist. (I am just starting to learn group theory and category theory, to solve this problem; I suspect I want a specific semi-lattice structure, but I haven't studied enough to know what properties I want)

My plan is to encode hourly buckets (location) as primes and how fuzzy they are (concentration) as their powers. I don't know if this will work completely, but it will be the starting point for my next experiment: try to learn the probabilistic sketch I want.

I suspect that I will need different activation functions that you'd normally use in NN, because linear or ReLU or similar won't be good to represent in finite space what I am searching for (likely a modular form or L-function). Looking at Koopman operator theory, I think I need to introduce non-linearity in the form of a Theta function neuron or Ramanujan Tau function (which is very connected to my problem).


I would argue that there are a few fundamental ways to make progress in mathematics:

1. Proving that a thing or set of things is part of some grouping

2. Proving that a grouping has some property or set of properties (including connections to or relationships with other groupings)

These are extremely powerful tools and they buy you a lot because they allow you to connect new things in with mathematical work that has been done in the past. So for example if the GP surmises that something is a Lie group that buys them a bunch of results stretching back to the 18th century which can be applied to understand these neural nets even though they are a modern concept.


> what this nomenclature gymnastics buys you?

???

Are you writing off all abstract mathematics as nomenclature gymnastics, or is there something about this connection that you think makes it particularly useless?


I did a little spelunking some time ago reacting to the same urge. Tropical geometry appears to be where the math talk is at.

Just dropping the reference here, I don't grok the literature.


> deep connection between neural networks and differential or algebraic geometry

I disagree with how you came to this conclusion (because it ignores non-linearity of neural networks), but this is pretty true. Look up gauge invariant neural networks.

Bruna et al. Mathematics of deep learning course might also be interesting to you.


What? The very point of neural networks is representing non-linear functions.


Isn't this also(just?) a description of how high-dimensional embedding spaces work? Putting every kind of concept all in the same space is going to lead to some weird stuff. Different regions of the latent space will cover different concepts, with very uneven volumes, and local distances will generally be meaningful (red vs green) but long-distances won't (red vs. ennui).

I guess we could also look at it the other way; embedding spaces work this way because the underlying neurons work this way.


I had the same feeling, I have always seen it like:

1. Neurons encode a concept and activate when it shows up.

2. No, it's way more complicated and mysterious than that.

And now this seems to add:

3. Actually, it's only more complicated than that in a fairly straightforward mathematical sense. It's not that mysterious at all.

I suspect this means that either I'm not picking up on subtleties in the article, or Scott is representing it in a way that slightly oversimplified the situation!

On the other hand, the last quote in the article from the researchers does seem to be hitting the "it's not that mysterious" note. A simple matter of very hard engineering. So, I dunno. Cool!


Before finishing my read, I need to register an objection to the opening which reads to me so as to imply it is the only means:

> Researchers simulate a weird type of pseudo-neural-tissue, “reward” it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want.

This isn't the only way. Back propagation is a hack around the oversimplification of neural models. By adding a sense of location into the network, you get linearly inseparable functions learned just fine.

Hopfield networks with Hebbian learning are sufficient and are implemented by the existing proofs of concept we have.


This is true. We use backpropagation not because it’s the only way or because it’s biologically plausible (the brain doesn’t have any backward passes) but because it works. Neural networks aren’t special because of any sort of connection to the brain, we use them because we have hardware (GPUs) which can train them pretty quickly.

I feel the same way about transformers vs RNNs: even if RNNs are more “correct” in some sense of having theoretically infinite memory it takes forever to train them so transformers won. And then we developed techniques like Long LoRA which make theoretical disadvantages functionally irrelevant.


I agree that doing effective things is effective and we should do effective things when they prove valuable. These things here are generally simplifications or even distillations which can be faster and more efficient which are excellent attributes of solutions. My objection isn't that we develop great systems but that we don't forget that other branches in the solution space exist. Particularly here because the more rich enough may help yield the next generation of solutions.


> developed techniques like Long LoRA which make theoretical disadvantages functionally irrelevant.

How’s that?


Context windows used to be tiny e.g. 768 or 1024. That meant sliding windows of limited context if you had any reasonably sized input. If your context window is 32k or even 100k, a lot of inputs will fit into the context window entirely.

The reason huge context windows weren't possible in the past is that memory requirements were quadratic with input length. Long LoRA lets us use less memory for our context windows, or use the same memory footprint for larger context windows.


I think it would be really exciting if somebody could show that ANNs that more resembled biological neurons could learn function approximation as well as (or better than!) current DNNs. However, my understanding of math and engineering suggests that for the time being, the mechanisms we currently use and invest so much time and effort into will exceed more biologically inspired neurons, for utterly banal reasons.


Yet brains remain superior in many regards. Still, by all means, let's celebrate the advances we make!


Back propagation isn't a hack. It's a triumph. It's powering the revolution we're experiencing.


It has yield some excellent results that we should celebrate and use. Yet the consensus of the field, when I was properly informed on such things, was that we developed back propagation while under a misunderstanding that linearly inseparable functions (e.g. XOR) could not be learned by Hebbian learning in Hopfield networks. This was a correct result in the context of his assumptions but too limited and/or over applied conclusion from Minsky's work.


At least the first part reminded me of Hyperion and how AIs evolved there (I think the actual explanation is in The Fall of Hyperion), smaller but more interconnected "code".

Not sure about actual implementation, but at least for us concepts or words are not pure nor isolated, they have multiple meanings that collapse into specific ones as you put several together


> No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, “reward” it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want.

There is a distinction to be made in "knowing how it works" on architecture vs weights themselves.


Personally I find the original paper much better written and easier to understand: https://transformer-circuits.pub/2023/monosemantic-features/...


If you have a background in ML, then yes, a paper is almost always better. I've recommended papers over sources like Towards Data Science etc. here. However, for laypeople, I doubt it would be as effective - they'd need to look up terms like MLP, ReLU, UMAP, Logit, or even what an activation function is, and they are the target audience of this post.


There's also a difference in goals. The point of a paper is to convince an expert that the claim is correct. With an article, you can just assume that the claim is correct (or give it any level of credence you want), because the point is to inform more than convince. So the article can skip or sketch all the showing-your-work and steps and proofs and just present the conclusions.


By the same token, thinking in memes all the time may be a form of impoverished cognition.

Or, is it enhanced cognition, on the part of the interpreter having to unpack much from little?


Darmok and Jalad at Mar-a-Lago.


Shaka, paying to build the walls.


>By the same token, thinking in memes all the time may be a form of impoverished cognition.

I would recast this: any thinking is a linear superposition of weighted tropes. If you read TVTropes enough you'll start to realize that the site doesn't just describe TV plots, but basically all human interaction and thought, nicely clustered into nearly orthogonal topics. Almost anything you can say can be expressed by taking a few tropes and combining them with weights.


That's what I wonder, though: mimetic thinking as a viral phenomenon. It may reshape one's perceptions, forcing a certain conformance, like prions.

If there's any room for alternatives or unconventional thought, it must now be assigned within an established hierarchy. Or, it must be shoehorned into invisible guardrails.

This is not bad in itself. Of course, useful ideas must transmit successfully to realize progression over decades.

But if one has a choice to focus attention, there also ought to exist the option to ignore it.

I don't mean to "ignore all frameworks" or "everything original must come from the self." I just mean I avoid TVTropes--yet I also embrace Dramatica.

No doubt TVTropes is a form of design patterns for writing, for which it has incredible value. It's just not something I want to "shape my mind around."

As I write this, I know it doesn't make much sense. It may just be a silly thing, and I have yet to "mature to realize the concept of" a shared, tropic mindview.


Some kind of single context abstract interpretation maybe.


As described in the post, this seems quite analogous to the operation of a bloom filter, except each "bit" is more than a single bit's worth of information, and the match detection has to do some thresholding/ranking to select a winner.

That said, the post is itself clearly summarizing much more technical work, so my analogy is resting on shaky ground.


All this anthropomorphizing of activation networks strikes me as very odd. None of these neurons "want" to do anything. They respond to specific input. Maybe humans are the same, but in the case of artificial neural networks we at least know it's a simple mathematical function. Also, an artificial neuron is nothing like a biological neuron. At the most basic -- artificial neurons don't "fire" except in direct response to inputs. Biological neurons fire because of their internal state, state which is modified by biological signaling chemicals. It's like comparing apples to gorillas.


>None of these neurons "want" to do anything. They respond to specific input.

Yes well, your neurons don't "want" to do anything either.

>Maybe humans are the same, but in the case of artificial neural networks we at least know it's a simple mathematical function

So what, magic ? a soul ? If the brain is computing then the substrate is entirely irrelevant. Silicon, biology, pulleys and gears. all can be arranged to make the same or similar computations. If you genuinely believe the latter, it's fine. The point is that "simple" mathematical function is kind of irrelevant. Either the brain computes and any substrate is fine or it doesn't.

>Also, an artificial neuron is nothing like a biological neuron.

They're not the same but "nothing like" is pushing it a lot. They're inspired by biological neurons and the only reason modern NNs aren't closer to their biological counterparts is because they genuinely suffer for it, not because we can't.

>Biological neurons fire because of their internal state, state which is modified by biological signaling chemicals

Brains aren't breaking break causality. The fire because of input.


Comments like this are incredibly grating. You condescend to the interlocutor for making a mistake which only exists in your own mistaken world model. Your confidence that neurons and ANN weights and «pulleys and gears» are all equivalent because there is, in theory, an intention to instantiate some computation, and to think otherwise is tantamount to belief in magic and broken causality, is just confused and born out of perusing popular-scientific materials instead of relying on scientific literature or hands-on experience.

> The fire because of input.

No they do not fire because of input, they modulate their firing probability based on input, and there are different modalities of input with different effects. Neurons are self-contained biological units (descended, let me remind you, from standalone unicellular organisms, just like the rest of our cells), which actually have an independently developing internal state and even metabolic needs; they are not merely a system of logic gates even if you can approximate their role with a system of equations or an ANN. This is very different, mechanistically and teleologically. Hell, even spiking ANNs would be substantially different from currently dominant models.

> So what, magic ? a soul ? If the brain is computing then the substrate is entirely irrelevant

Stop dumbing down complex arguments to some low-status culture war opinion you find it easy to dunk on.


>Your confidence that neurons and ANN weights and «pulleys and gears» are all equivalent because there is, in theory, an intention to instantiate some computation, and to think otherwise is tantamount to belief in magic and broken causality, is just confused and born out of perusing popular-scientific materials instead of relying on scientific literature or hands-on experience.

Computation is substrate independent. I'm not saying neurons and ANN weights and «pulleys and gears» are the same. I'm saying it does not matter because what you perform computation with does not change the results of the computation. If the brain computes, then it doesn't matter what is doing the computation.

>No they do not fire because of input, they modulate their firing probability based on input, and there are different modalities of input with different effects. Neurons are self-contained biological units (descended, let me remind you, from standalone unicellular organisms, just like the rest of our cells), which actually have an independently developing internal state and even metabolic needs; they are not merely a system of logic gates even if you can approximate their role with a system of equations or an ANN. This is very different, mechanistically and teleologically. Hell, even spiking ANNs would be substantially different from currently dominant models.

Yes, a neuron is firing because of input. To suggest otherwise is to suggest something beyond cause and effect directing the workings of the brain. If that is genuinely not the case then feel free to explain why, rather than an ad hominin attack on someone you don't even know.

> So what, magic ? a soul ? If the brain is computing then the substrate is entirely irrelevant

>Stop dumbing down complex arguments to some low-status culture war opinion you find it easy to dunk on.

I personally don't care if that's what anyone believes. The intention is not to attack anyone.

If you believe in a soul or the non religious equivalent, that's fine. We just have different axioms.

If you don't believe in a soul(or the equivalent) but somehow think substrate matters then you need to explain why because it makes no sense.


I am not well versed in any of this, but from reading the counterarguments, I think two good points are being made:

* Analogies aside, neurons are quite different than NN nodes, because each neuron has an incredibly complex internal cellular state, whereas an NN node just has an integer for state.

* A brain is not a "function" in the way that a trained LLM model is. Human life is not a series of input prompts and output prompts. Rather, we experience a fluid stream of stimuli, which our brain multiplexes and reacts to in a variety of ways (speaking, moving, storing memories, moving our pupils, releasing hormones, etc). That is NOT TO SAY a brain violates causality; it's saying that the brain is mechanically doing so much more than an LLM, even if the LLM is better at raw computation.

None of this IMO precludes AGI from happening in the medium term future, but I do think we should be careful when making comparisons between AGI and the human brains.

Rather than comparing "apples to gorillas", I'd say it's like comparing a calculator to a tree. Yes, the calculator is SIGNIFICANTLY better at multiplication, but that doesn't make it "smarter" than a tree, whatever that means.


I do not even think any of this has much of impact on AGI timelines. Human brain cells are not a superior substrate for computing "intelligence". They just are what they are; individual cells can somewhat meaningfully "want" stuff and be quasi-agents unto themselves, they do much more than integrate and fire. Weights in an ANN are purely terms in an equation without any inner process or content.


> Computation is substrate independent. I'm not saying neurons and ANN weights and «pulleys and gears» are the same. I'm saying it does not matter because what you perform computation with does not change the results of the computation. If the brain computes, then it doesn't matter what is doing the computation.

It's a tautology. If the substrate did change the computation, then it wouldn't be the computation.

Claims where it isn't possible for you to be incorrect may be less impressive than they seem.


It's not a tautology. That you can go out tomorrow and buy a deluge of computers with different hardware and run the same software without change is exactly a demonstration of substrate independence.

https://www.edge.org/response-detail/27126


And if one of those computers failed, it wouldn't classify as a (proper) computer.

You can move your pointer anywhere you'd like, it is ultimately tautological. Infinite regress is a bitch lol

Say, have you taken into consideration the role consciousness and culture are playing here? Like this "reality" you are describing, do you know what the actual, biological/scientific source of it is? :) But now I'm kind of cheating, aren't I...I think we're not supposed to say that part out loud! ;)


There are views where there is an implicit substrate that exists in another layer of reality, ie. dualism. That layer is generally not counted among the substrate. So a computation can be substrate dependent by introducing a non-material cause. (Disclaimer: I don't personally know anyone who believes anything like this, so this may be a bad paraphrasing.)


If the computation doesn't compute completely and deterministically on all substrates, then it isn't substrate independent though is it?

Human cognition would be a good example of substrate dependent computation I'd think....it even varies per instance of substrate.


> Stop dumbing down complex arguments

It's not dumbing down. It's extracting the crux of the matter that the complexity of arguments is trying to hide, perhaps unintentionally. Either the brain implements a function that can be approximated by a neural network thanks to universal approximation theorem, or the function cannot be approximated (you need arguments for why it is the case), or magic.


This is technically true but kind of misses the point in my opinion. A neural network can approximate any function in theory but that doesn't mean it has to do so in a reasonable amount of time and with a reasonable amount of resources. For example, take the function that gives you the prime factors of an integer. It is theoretically possible for a neural network to approximate this for an arbitrarily large fixed window but is provably infeasible to compute on current hardware. In theory, a quantum computer could compute this much faster.

This is not to say that the human brain leverages quantum effects. It's just a well known example where the hardware and a specific algorithm can be shown to matter.

I also think it's strange to describe the brain as implementing a function. Functions don't exist. We made them up to help us think about building useful circuits (among other things). In this scenario, we would be implementing functions to help us simulate what is going on in brains.


> is just confused and born out of perusing popular-scientific materials instead of relying on scientific literature or hands-on experience.

I suspect there's some fundamental metaphysical framework protection in play here, the sort of language being used is pretty common, I believe it to be a learned cultural behavior (from consuming similar arguments).


I think you're being a bit pedantic here.

Neurons also don't "respond" to specific input either. They can't speak or provide an answer to your input.

These are all just abstract metaphors and analogies. Literally everything in computer science at some point or another is an abstract metaphor or analogy.

When you look up the definition and etymology of "input", it says to "put on" or "impose" or "feed data into the machine". We're not literally feeding the machine data, it doesn't eat the data and subsist on it.

You could go on and on and nitpick every single one of these, and I don't think the use of "want" (i.e. anthropomorphizing the networks to have intent) is all that bad.


Wait what are you referring to specifically? Any anthropomorphism in the article is _clearly_, clearly the author's admitted simplification due to the incredible density of the subject matter.

Given that, I honestly can't find anything too upsetting.

In any case, anthropomorphism is something I don't mind, mostly. Is it misleading? For the layman. But the domain is one of modeling intelligence itself and there are many instances where an existing definition simply makes sense. This happens in lots of fields and causes similar amounts of frustration in those fields. So it goes.


> Humans also use neural nets to reason about concepts. We have a lot of neurons, but so does GPT-4.

I feel this is an abuse of the language. Biological neurons and ANN neurons aren’t the same or even all that similar. Brains don’t do backprop for example. Only forward passes. There’s a zoo of neurotransmitters which change the behavior of individual neurons or regions in the brain. Unused neurons in the brain can be repurposed for other things (for example if your arm is amputated).


>Biological neurons and ANN neurons aren’t the same or even all that similar.

They're not the same but they're definitely similar.

>Brains don’t do backprop for example.

We've developed numerous different learning algorithms that are biologically plausible, but they all kinda work like backpropagation but worse, so we stuck with backpropagation. We've made more complicated neurons that better resemble biological neurons, but it is faster and works better if you just add extra simple neurons, so we do that instead. Spiking neural networks have connection patterns more similar to what you see in the brain, but they learn slower and are tougher to work with than regular layered neural networks, so we use layered neural networks instead.

The only reason modern NNs aren't closer to their biological counterparts is because they genuinely suffer for it.

The secret of bird flight was wings. Not feathers. Not flapping.


> The only reason modern NNs aren't closer to their biological counterparts is because they genuinely suffer for it.

And yet, there's no ANN that's as good at interacting with the real world as the simplest worms we've studied, despite having many times more neurons than those worms have cells.

We are clearly still missing some key pieces of the puzzle for intelligence, so claiming that the difference between ANNs and biological neurons is irrelevant is quite premature. We are far away from having an airfoil moment in AI research.


Simple organisms are not necessarily easier to imitate than more complex ones. Our vehicles can drive faster than the fastest animals, but no car can absorb nutrients across a cell membrane. The simpler a system is, the more an imitation has to stick to the specifics to the system's particular implementation. Larger, more complex systems can usually perform more abstract tasks, which allows more diverse approaches.


I assert that most robotics AI, including self driving cars, are at least as good at interacting with the real world as the simplest worms we've studied. Do different things, but at least as good.


My point isn’t about whether ANNs work, it’s that they’re so fundamentally different (both locally at the level of the neuron and globally at the level of the brain) that calling them both “neurons” is pretty imprecise.

Silicon simulations of brains may “suffer” from being faithful but this also discounts the advantages that brains have. As I mentioned for example, brains can repurpose neurons for other tasks. Brains can also generalize from a single example, unlike neural networks which require thousands if not millions of examples.

Brains also generally do not suffer from catastrophic forgetting in the same way that our simulations tend to. If I ask you to study a textbook on cats you won’t suddenly forget the difference between cats and dogs.


>Brains can also generalize from a single example, unlike neural networks which require thousands if not millions of examples.

There is not a single brain on earth that is the blank slate a typical ANN is. "Brains generalize from one example" is pretty dubious. Millions of years of evolution matter.

>As I mentioned for example, brains can repurpose neurons for other tasks.

Isn't this just a matter of the practical distinction between training and inference and not some fundamental structural limitation ?

>Brains also generally do not suffer from catastrophic forgetting in the same way that our simulations tend to.

This suggests CF may well be a simple matter of scale - https://palm-e.github.io/

Since individual anns are much closer to synapses, we don't have anything near the scale of the brain yet.


> There is not a single brain on earth that is the blank slate a typical ANN is.

Of course structure matters, but biological neurons have far more degrees of freedom than those in ANNs. The fact that we even need to keep differentiating between the two is an indication that classifying both as “neurons” is not accurate.

> Isn't this just a matter of the practical distinction between training and inference and not some fundamental structural limitation?

It’s a difference in capabilities of the things themselves. A biological neuron organically seeks out new connections. Sure we could program that into an ANN somehow but the fact that nodes in an ANN don’t have this capability out of the box is a fundamental difference.

> CF may well be a simple matter of scale

For a moment, a big enough network might be able to mirror an entire brain with the lottery ticket hypothesis. But if it takes two or ten or a thousand ANN neurons to simulate the degrees of freedom of a biological neuron, are they really the same?


>There is not a single brain on earth that is the blank slate a typical ANN is.

You are the one saying that biological neurons and ANN are similar...


> Brains can also generalize from a single example, unlike neural networks which require thousands if not millions of examples.

Since the other comment already went for the evolution and structure angle, I'll go for the other part. What single example? What test have you seen done on the brain capacity of few weeks old fetuses? Our brains start learning patterns in the world before we are even born. How much input does a baby receive every single second from it's eyes and ears and every other sense?

Even when you are "analyzing" a new object for the first time, you receive a continuous stream of sensory input of it. Our brain even requires that to work, if you put a single frame different in a fast enough display, most times you won't even notice the extra frame and your brain will just ignore it.


I feel anthropomorphizing is perfectly reasonable especially in this context. How would you like it described?


It's one thing to deliberately anthropomorphize in order to construct an analogy.

It's another thing to anthropomorphize by accident or by illusion, as per pareidolia: https://en.wikipedia.org/wiki/Pareidolia . Just as in pareidolia, where the human brain is primed to "see" a human face in a certain pattern of light and shapes, it seems that human brains are primed to "see" a human intelligence in the output of an LLM, because our brains are pattern-matching on "things that look like human speech". But that's a reason to not anthropomorphize LLMs, precisely because people are inclined to do so without thinking.


Anthropomorphizing is valid in relation with laguage, human language. Other than that anthropomorphizing would be somewhat valid with other human outputs but ideally it is understood it’s just one of the lenses in the toolkit, that it has pros and cons just like any idea or any tool.


There’s so much baggage attached with it. The only thing a neural network “wants” is to minimize its loss function. That’s all.


The neural network doesn't want to minimize its loss function any more than you want to maximize your inclusive genetic fitness.

The neural network training process wants to minimize the neural network's loss function. The neural network, if it "wants" anything, will have such wants as were embedded in its weights through the process of minimize its loss function, which will mostly be "wants" whose satisfactions correlate with reduced loss function. Of course this is using the term "want" in a behaviourally-descriptive sense, not a subjective-experience sense.

For example, AlphaGo's training routine wants to minimize AlphaGo's loss function. AlphaGo wants to beat you at Go.


See also: Reward is not the optimization target https://www.greaterwrong.com/posts/pdaGN6pQyQarFHXF4/reward-...


You must not believe in human free will then. Or maybe you don’t believe the brain is the key aspect of human intelligence. Or believe the brain is more than just a organic NN


I’m saying that neural networks are so different from brains that anthropomorphizing them runs a real risk of glossing over important differences.

Terms like “free will” and “intelligence” are too fuzzy to talk about precisely unless we’re on the exact same page regarding what we mean. And applying our imprecise definitions to machines is not doing us any favors.


All humans “want” to do is obey the laws of physics.


Can we anthropomorphize the loss function then?


That's like anthropomorphizing utility functions in place of the humans that (are conjectured to) have them, isn't it?


I didn’t hear much anthropomorphism outside of the subtitle.

Anyways, this is Scott’s writing style. I recall an earlier ACX post on alignment that was real heavy on ascribing desires and goals to AI models.


Do you mean that artificial neurons are inherently passive while biological neurons are inherently active i.e. they would act in spite of external input?

Just wondering if I understood you, I don't know anything on the subject.


That was already addressed in the comment you're responding to. They wrote that in the case of artificial neural networks "at least we know [there is no god inside the machine]". With regards to humans, we don't know.


There are a great many possibilities between:

- a fully deterministic machine (even if the interface and the way OpenAI let people access ChatGPT make it seem like it's non-deterministic, there are fully deterministic models out there, who not only only respond to inputs but also always respond the exact same thing given the same inputs [query, seed, ...]),

and:

- god exists

There could be chaos at work when human thinks. There may be interferences at play, say because whatever element that traveled trillion of kilometers just traversed our brain.

While a fully deterministic machine that always respond to the same input in the same way is just that: a deterministic machine.

P.S: I don't know about other LLMs like Falcon 180b but image-generation models like StableDiffusion are fully deterministic. I think a model is using a broken design and shall quickly hit limitations if it cannot be queried in a deterministic way (and its usecases are certainly limited if repeatability is not achievable). If you want different answers, use a different seed or a different query. But the same query+seed should always give the exact same output.


Also not all notions of God are dualistic — where you get to (notionally) talk to the guy using “you” and “I”. India’s Advaita Vedanta, Ibn Arabi’s version of Sufism and even some sects of Hasidism all hold that everything is in God. I haven’t found the Christianity that does this, but if it was discovered under Islam is probably thinkable here.

Are these different ideas of God entirely? Yes: in India there are however many gods and Brahman says these are lesser precisely because they don’t include the whole universe.


For the record LLM are theoretically fully deterministic, but non-deterministic in practice. First, some randomness is deliberately set via 'temperature', and second some randomness comes from things like the order of floating point operations when you divide the model on your GPU(s) without being super careful about it.


LLMs output probability distribution of the next token. And that automatically makes them non-deterministic. You can make their output deterministic by greedy sampling, fixing the seed of a pseudorandom generator, or by computing exponentially growing probability distribution of all possible continuations, but it doesn't change the fact that LLMs produce probability distributions you need to sample somehow to get a definite result.


That wasn't clear to me because having internal state doesn't mean that any process is running. That's the difference between memory and CPU.


I'd say 'in spite'-ness is pretty spot on for life in general


Can you be more specific about what particular anthropomorphizing you object to? The only place the author uses the word want is in describing the wants of humans.


Wait till you find out how we talk about evolutionary adaptation... a couple analogies and elided concepts here and there and you'd swear Lamarck had smothered Darwin in his cradle


Yes, Lamarck is alive and well. I frequently hear colleagues in biology with years of experience making lazy statements such as, "Humans evolved to walk because..." as if evolution is a conscious act. In many cases, these people are using the term evolution as shorthand for, "Random mutations gave rise to new gene variants (alleles), and over numerous generations, natural selection leads to an increase in the prevalence of beneficial variants, specifically those that provide a reproductive advantage. The genetic variants that combined to enable bipedal locomotion provided such an advantage to the human population". Evolution is a passive, natural process driven by genetic variation, environmental changes, and the differential reproductive success of individuals with advantageous traits. This in turn results in population changes. Evolution doesn't involve a conscious choice or direction. Evolution is the result of the cumulative effects of these factors over long periods of time.


While I generally agree with you, is it really "lazy" to say "humans evolved to walk to"? Usually in the context, it's not being used to claim there is some sort of intent or purpose, but rather to shortcut the (rather verbose) description you gave.

Also I can think of some counterpoints to yours: the people who bred teosinte into corn (or any wild grain into a domesticated one) appear to be making conscious choices or direction- that is, they used their intelligence and reasoning from observed examples of pairings to conclude that they could make improved specimens based on selective breeding (without knowing about random mutations of natural selection!).

And if we start to modify human germline then would also be an example of evolution with conscious choice or direction (assuming the modifications became fixed in the population).


Even in math, the most exact of all fields of study, people in practice are being nonrigorous and ambiguous and take shortcuts all the time. That’s just how communication works, because it would be incredibly grating and inefficient to try to be 100% unambiguous when everybody already knows what you mean.


Social, memetic evolution is at least partially Lamarckian.


All this anthropomorphizing of humans strikes me as very odd. None of these humans "want" to do anything. They respond to specific input. Maybe artificial neural networks are the same, but in the case of humans we at least know it's a simple reaction to neurotransmitter signals.


You're probably just being tongue in cheek here, but I still wanted to add that we humans do actually "want". Desires, intentions, these are real things. You can try to represent them as loss functions or rewards in some mathematical model but that isn't the original thing. Consider this excerpt from https://scottaaronson.blog/?p=7094#comment-1947377

> Most animals are goal-directed, intentional, sensory-motor agents who grow interior representations of their environments during their lifetime which enables them to successfully navigate their environments. They are responsive to reasons their environments affords for action, because they can reason from their desires and beliefs towards actions.

In addition, animals like people, have complex representational abilities where we can reify the sensory-motor “concepts” which we develop as “abstract concepts” and give them symbolic representations which can then be communicated. We communicate because we have the capacity to form such representations, translate them symbolically, and use those symbols “on the right occasions” when we have the relevant mental states.

(Discrete mathematicians seem to have imparted a magical property to these symbols that *in them* is everything… no, when I use words its to represent my interior states… the words are symptoms, their patterns are coincidental and useful, but not where anything important lies).

In other words, we say “I like ice-cream” because: we are able to like things (desire, preference), we have tasted ice-cream, we have reflected on our preferences (via a capacity for self-modelling and self-directed emotional awareness), and so on. And when we say, “I like ice-cream” it’s *because* all of those things come together in radically complex ways to actually put us in a position to speak truthfully about ourselves. We really do like ice-cream.


> but I still wanted to add that we humans do actually "want".

I would also like to add that the subject of conversation is artificial and natural neurons, which humans, though contain some, are not.

If a NN is trained to do something, it can be equally considered as "wanting" to do that thing within the autonomy it is afforded, as much as any human.


> If a NN is trained to do something, it can be equally considered as "wanting" to do that thing within the autonomy it is afforded, as much as any human.

Human wants are driven by instinct though - i.e. our preferences; if you like women, you like women, if you don’t, you don’t.

Our “output” is in service to those wants.

Current AI doesn’t have instinct / preprogrammed goals - except for goal-driven AIs but the hyped up LLMs aren’t such AIs. Their output isn’t motivated by any goal - a LLM can’t deliberately lie to you to get you to do something; it lies because it doesn’t differentiate between what’s true and what’s false.


> Human wants are driven by instinct though

What is instinct physically?

> a LLM can’t deliberately lie to you to get you to do something; it lies because it doesn’t differentiate between what’s true and what’s false.

A LLM also can't do multi-step reasoning, yet here we are.


> What is instinct physically?

Does it matter? As long as it conceptually exist that’s all that matters.

LLMs don’t seek any goal. It’s advanced autocomplete.

You can’t give it a bunch of facts and a goal then expect it to figure out how to achieve said goal. It can give you an answer if it already has it (or something similar) in its training set - in the latter case, it’s like a student who didn’t study for the exam and tries to guess the right answer with “heuristics”.

> A LLM also can't do multi-step reasoning, yet here we are.

Where’s “here”?


> Does it matter? As long as it conceptually exist that’s all that matters.

It does matter. Where is it? By evading this you have basically described "instinct" as something computers simply cannot have, axiomatically. That's boring.

> LLMs don’t seek any goal. It’s advanced autocomplete.

These two sentences are contradictory.

> You can’t give it a bunch of facts and a goal then expect it to figure out how to achieve said goal.

Have you used a language model lately? It sounds like you're saying things you think a LLM shouldn't be able to do as if they can't do them. Giving it a bunch of facts and a goal and expecting it to figure out how to achieve the goal is something you can do. It's not perfect, but they can be surprisingly good.

> Where’s “here”?

At a point in time where LLMs can do multi-step reasoning.


Just because a model converges towards some behavior doesn’t mean it “wants” like a human does. You can call it that in your framework but it is overly reductive to apply it to humans and animals imo


If my grandmother had wheels, she'd be a bike.

Fish don't like ice cream and we don't feel the need to spawn. It's because of how we are built.


I’m sorry what? i don’t understand how your first sentence applies to what I’m saying, and many humans feel a need to spawn.


I'm using the word literally, as in swim upstream and lay eggs.


This is the definition of anthropomorphism: “Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities.” By definition the point you’re making doesn’t make sense.


> None of these humans "want" to do anything. They respond to specific input

Weird, because I'm pretty sure I had the choice whether to respond to this comment.

It wasn't the light waves hitting my retina from an HN post, leading to nerves firing and neurotransmitters all coming together to post this.

I posted it because I have free will. I almost didn't.

Unless you truly feel that there is no free will, and that reality is just a bizarre movie we have to experience .... well, then ... we'll disagree.


At some point after the light waves hit your retina, did one or more of the atoms in your body not obey the exact same physical laws as the atoms outside your body?


This is a great post and comments here.

I wanted to mention that there has been quite a lot of work in machine learning and statistics over the last few years as to how to analyze large models with non-identifiable parameters, of which LLMs are one such instance. Given that multiple values of the parameters can lead to demonstrably similar outputs (the non-identifiability problem), an important high-level takeaway is that it is typically more productive to shift the unit of analysis from the parameters to examples/exemplars of the input/output.

In my biased opinion, our startup Reexpress AI has already taken this older line of work to its logical conclusion: Non-parametric constraints for large language models that make a direct connection between the observed data and new predictions.

Our website has a link to our macOS software: https://re.express/

(Aside: Separately, we also do some interesting things to make it fast enough for on-device learning. While the full unlabeled training datasets for an LLM are typically too large to process on-device, the standard setup here for a well-specified task is that there is a rather smaller amount of high-quality labeled data from which we derive the uncertainty estimates, and which can effectively be modeled on-device. Thus, perhaps counter-intuitively, deriving such estimates for the largest models available today is often no problem in practice, because we can just estimate against the available labeled data for a given task. Our software makes it easy to do this by ensembling additional models with our on-device models.)


> Shouldn’t the AI be keeping the concept of God, Almighty Creator and Lord of the Universe, separate from God-

This seems wrong. God-zilla is using the concept of God as a superlative modifier. I would expect a neuron involved in the concept of godhood to activate whenever any metaphorical "god-of-X" concept is being used.


I mean, it's not actually. It's just a somewhat unusual transcription (well, originally somewhat unusual, now obviously it's the official English name) of what might be more usually transcribed as "Gojira".


Ah, I thought the Japanese word was just "jira". My mistake.


That's an entirely different monster.


Indeed, but not an entirely unrelated one though - per https://en.wikipedia.org/wiki/Jira_(software)#Naming the inspiration path was Bugzilla -> Godzilla -> Gojira -> Jira (which is why Confluence keeps correcting me when I try to spell it JIRA)


I see what you did there.


I feel like we're a few more paradigm shifts away from self-driving cars, and this is one of them - being able to actually understand neural nets and modify them in a constructive way more directly - aka engineering.

Some more:

    cheaper sensors (happening now)
    better sensor integration (happening now, kind of)
    better tools for ml grokking and intermediate engineering (this article, kind of)
    better tools for layering ml (probably the same thing as above)
    a new model for insurance/responsibility/something like this (unsure)
    better communication with people inside and outside the car (barely on the radar)


This reminds of research done on category-specific semantic deficits, where there can be neurodegeneration that impacts highly specific or related knowledge (for example, brain trauma that affects a person's ability to understand living things like zebras and carrots, but not non-living things like helicopters or pliers).

https://academic.oup.com/brain/article/130/4/1127/278057


such a wonderful article, really enjoyed reading it


When LLMs are trained on text, are the words annotated to indicate the semantic meaning, or is the LLM training process expected to disambiguate the possibly hundreds of semantic meanings of an individual common word such as "run"?


The task LLMs are trained on is "predict the next word", which elegantly is included for free in your training set of text. Typically no annotation is provided, since that would involve a ton of human labor doing the annotations.


Could you ask a specialized AI to define each of the words in a block of text to automate the meaning extraction?


Probably. There's good work showing impressively performing small models that were trained on more "text book" like data rather than just loads of text - but where the "text books" were either wholly or largely created by another AI model.

Using models to generate/score/rank/modify data to be more useful as training data is a very interesting angle.


see https://www.nature.com/articles/nature12160 "mixed selectivity neurons"


Superposition makes sense when you understand all of ML as a convolution.


You can just note the fact that two bits can represent four values.


Can you explain machine learning through this lens?


I’ve never understood the hidden layers argument. Ultimately these models are executing code. You can examine the code. Why can’t that be done?


Fundamentally, artificial neural networks boil down to some code that can be written, mathematically, as

σ(C*σ(B*σ(A*x)))... = y

repeated matrix-vector multiplication, separated by nonlinear functions, σ(). A,B,C etc here are your weights, x is your input vector, y is your output vector. Any nonlinear function will work, some behave better than others, but they must be present, otherwise you could simplify the problem by apply the associative property to the weights side of things, and you'd effectively have a single layer that can only simulate linear functions. A hidden layer is just any vector that is intermediate to that calculation, for example the result of σ(A*x) would be the first hidden layer.

So right off the bat, you obviously have a problem: "Examining the code" means examining the weights, which are just gigabyte/terabyte sized matrices. Opaque is putting it mildly. The only sane approach is to start from one of our known-meaningful vectors (the input or output), and work our way inwards from there, either seeing what elements of hidden layer vectors have the most significant value when a certain input is applied, or determining what hidden layer vector values produce values closest resembling a desired output.

Then you start running into the problems described in the article.


This is a really good explanation for a fully connected network. Most models will have more complex architectures so are even more complicated than this.


I haven't been great about keeping up with advances in the field, but to my understanding most if not all architectures in effect merely enforce symmetries upon the network. That is, they can be represented by a fully connected network but in that representation not all weights are free, in that some are fixed (0 or 1) or some are dependent (A(1,1) will also be equal to C(1,1)).


I don’t blame you, things are pretty diversified so I’m white knuckling it through my own subfield.

But as a simple example, convolution would be a little tedious to describe in the general notation you wrote above without getting into the image dimensions, stride, padding etc., not to mention residual layers or norm layers that are commonly used. Then there are things like stop gradient, dropout or even other training targets which are only used during training but not inference.


Convolution actually can be pretty elegantly translated into multiplication by a matrix with symmetries. It's been a few years, but that was an example we had to work out by hand in my Deep Learning course at university.

It's pretty much the quintessential example of an enforced symmetry, in that it introduces a symmetry against translation.


Think of it this way: looking at the neurons is like looking at the assembly and trying to understand how a video is played on YouTube. Yes, in theory it’s possible, but realistically no one would be able to do that. At least not in a reasonable amount of time.


Video decoding works by reading and writing pixels to the same buffers you use for displaying them, so you could see a lot of how it works by watching the accesses there.


Yeah, but imagine you don’t know what “decoding”, “buffers”, and “display” mean in a software context.


If by “code” you mean “the source code people wrote”, then, that source code by itself does not determine the behavior. The behavior is determined by the numbers.

Saying that understanding that code implies understanding how the network works, is like saying “John wrote this emulator for GBA games, therefore he must understand how the game [some GBA game] works”.

Except, instead of the game being written in assembly, it was written in malbolge.


With these models, the Pytorch code is an interpreter, and the 100Gb of weights is the code. So you have say a 10G (10 weights = 1 LOC) LOC codebase, all in assembly, no comments. G'luck.

Oh and it doesn't always use logic. So there are no ORs and ANDs and IFs, just +, *, exp, max, etc.


“This is great for AIs but bad for interpreters. We hoped we could figure out what our AIs were doing just by looking at them. But it turns out they’re simulating much bigger and more complicated AIs, and if we want to know what’s going on, we have to look at those. But those AIs only exist in simulated abstract hyperdimensional spaces. Sounds hard to dissect!”

“Still, last month Anthropic’s interpretability team announced that they successfully dissected of one of the simulated AIs in its abstract hyperdimensional space.

(finally, we’re back to the monosemanticity paper!)

First the researchers trained a very simple 512-neuron AI to predict text, like a tiny version of GPT or Anthropic’s competing model Claude.

Then, they trained a second AI called an autoencoder to predict the activations of the first AI. They told it to posit a certain number of features (the experiments varied between ~2,000 and ~100,000), corresponding to the neurons of the higher-dimensional AI it was simulating. Then they made it predict how those features mapped onto the real neurons of the real AI.

They found that even though the original AI’s neurons weren’t comprehensible, the new AI’s simulated neurons (aka “features”) were! They were monosemantic, ie they meant one specific thing.

Here’s feature #2663 (remember, the original AI only had 512 neurons, but they’re treating it as simulating a larger AI with up to ~100,000 neuron-features).”


COOL~ interpreting a big AI with a bigger is like interpreting 42 with Earth~


The original is quite long, but quite interesting.[1] Reading it makes me feel like I did reading A Brief History of Time as a middle schooler - concepts that are mainly just out of reach, with a few flashes that I actually understand.

One particularly interesting topic is the "theories of superposition" section, which gets into how LLMs categorize concepts. Are concepts all distinct or indistinct? Are they independent or do they cluster? It seems that the answer is all of the above.

This ties into linguistic theories of categorization[2] that I saw referenced in (of all places) a book about the partition of Judaeo-Christianity in the first centuries CE.

Some categories have hard lines - something is a "bird" or it is not. Some categories have soft lines - like someone being "tall." Some categories work on prototypes, making them have different intensities within the space - A sparrow, swallow, or robin is more "birdy" than a chicken, emu, or turkey. Apparently Wittgenstein was the first to really explore with Family Resemblances that a category might not have hard boundaries, according to people who study these things.[3] These sorts of "manifolds" seem to appear, where some concepts are not just distinct points that are or aren't.

It's exciting to see that LLMs may give us insights into how our brains store concepts. I've heard people criticize them as "just predicting the next most likely token," but I've found myself lost when speaking in the middle of a garden path sentence many times. I don't know how a sentence will end before I start saying it, and it's certainly plausible that LLMs actually do match they way we speak.

Probably the most exciting piece is seeing how close they seem to get to mimicking how we communicate and think, while being fully limited to language with no other modeling behind it - no concept of the physical world, no understanding of counting or math, just words. It's clear when you scratch the surface that LLM outputs are bullshit with no thought underneath them, but it's amazing how much is covered by linking concepts with no logic other than how you've heard them linked before.

[1] https://transformer-circuits.pub/2023/monosemantic-features/...

[2] https://www.sciencedirect.com/science/article/abs/pii/001002...

[3] https://en.wikipedia.org/wiki/Family_resemblance


This is egregiously false. ML models are math and code. They are not tissue; there are no neurons in an ML model. None. The analogy is false. And it is used here to make you think that ML models are more human than they are, or more superhuman. There is no "bigger AI waiting to get out". Shoggoth anyone? These are scare tactics from hyperventilating EAs (both Scott and the Anthropic team).



The repetition of a false analogy does not make it true. They are no more neurons than transistors are "little brain cells". This is just shorthand for outsiders who can't hope to understand. But on HN, we have technologists. The nodes of an ML model are made of math and code. There is no neuron there, and those nodes work very different than neurons in their action and structure. The analogy breaks down.


[disclaimer: after talking to many people much smarter than me, I might, just barely, sort of understand this. Any mistakes below are my own]

Thanks for the heads up! AI is better at making up stories than humans already. Hodl my beer while I go buy some BSitCoins.


Every training set will produce a different set of weights, even the same training set with produce different weights with different initialization, leave alone slightly different architectures.

So what exactly is the point, except "look at us, we are so clever"?


As long as you read it with the skeptics "yes, but it's not intelligence" its a good read.

It's when you read it with the "at last, I can understand how reasoning and inference with meaning is going to emerge from this" you have a problem.

It's a great read but what you bring to it, informs what you take from it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: