Hacker News new | past | comments | ask | show | jobs | submit login

It would be much more interesting to see PCA (or t-SNE or whatever) on the internal representation within the model itself. As in the activations of a certain number of layers or neurons, as they change from token to token.

I don't think the OpenAI embeddings are necessarily an appropriate "map" of the model's internal thoughts. I suppose that raises another questions: Do LLMs "think" in language? Or do they think in a more abstract space, then translate it to language later? My money is on the latter.




> I suppose that raises another questions: Do LLMs "think" in language? Or do they think in a more abstract space, then translate it to language later? My money is on the latter.

The processing happens in latent space and then is converted to tokens/token space. There is research into reasoning models which can spend extra compute in latent space instead of in token space: https://arxiv.org/abs/2412.06769


Another take on a similar idea from FAIR is a Large Concept Model: https://arxiv.org/pdf/2412.08821


Sort of, yes. I think one of our next unlocks will be some kind of system which predicts at multiple levels in latent space. Something like predicting the paragraph, then the sentences in the paragraph, then the words in the sentences, where the higher level "decisions" are a constraint that guides the lower level generation.

In meta's byte-level model they made the tokens variable length based on how predictable the bytes were to a smaller model, allocating compute resources based on entropy.


I'd have to guess that the "transformations" being made to the embeddings at each layer are basically/mostly just adding (tagging with) incremental levels of additional grammatical/semantic information that has been gleaned by the hierarchical pattern matching that is taking place.

At the end of the day our own "thinking" has to be a purely mechanical process, and one also based around pattern recognition and prediction, but "thinking" seems a bit of a loaded term to apply to LLMs given the differences in "cognitive architecture", and smacks a bit of anthromorphism.

Reasoning (search-like chained predictions) is more of an algorithmic process, but it seems that the "reactive" pass-thru predictions of the base LLM are more clearly viewed just as pattern recognition and extrapolation/prediction.

Prove me wrong!


> Prove me wrong!

For future reference, it is hard to parse tone over the internet but this "command" read pretty poorly to me. I would have preferred if you asked a question or something else.

However, assuming best intentions...

> I'd have to guess that the "transformations" being made to the embeddings at each layer are basically/mostly just adding (tagging with) incremental levels of additional grammatical/semantic information that has been gleaned by the hierarchical pattern matching that is taking place.

> Reasoning (search-like chained predictions) is more of an algorithmic process, but it seems that the "reactive" pass-thru predictions of the base LLM are more clearly viewed just as pattern recognition and extrapolation/prediction.

I'm having trouble following. Are you saying that:

* The "reactive pass-thru predictions" are just pattern matched responses from the training text that come from "incremental levels of additional semantic information"

* There is some other algorithmic process which results in "search-like chained predictions" from the pattern matched responses

* These two capabilities, combined in a single "thing," are not analogous to thinking

?

> At the end of the day our own "thinking" has to be a purely mechanical process, and one also based around pattern recognition and prediction, but "thinking" seems a bit of a loaded term to apply to LLMs given the differences in "cognitive architecture", and smacks a bit of anthromorphism.

You can pick whatever term you like. What we seem to have is a system which can, through the embedded patterns of language, create a recursive search through a problem space and try and solve it by exploring plausible answers. If my dog came up with a hypothesis based on patterns it had previously observed, considered that hypothesis, discarded it, and then came up with a new hypothesis, I'd say it was thinking.

There are clear gaps between where we are and human capabilities especially as it relates to memory, in-context learning, and maintaining coherence over many iterations (well, some humans), but (to me) one of two things is probably true:

1. Models are doing something analogous to thinking that we don't understand.

2. Thinking is just a predict-act-evaluate loop with pattern matching to generate plausible predictions.

I lean towards the second. That's not to ignore the complexity of the human brain, it is just that the core process seems quite clear in the abstract to me via both observation and introspection. What can "thinking" (as you define it) do that is beyond these capabilities?


My mistake, actually. I was trying to recall "Change my mind!", and ended up with that instead. It was meant as a tongue-in-cheek challenge - I'd be more than happy to hear evidence of why I'm wrong and that there's a more abstract latent space being used in these models, not just something more akin to an elaborated parse tree.

To clarify:

1) I'm guessing that there really isn't a highly abstract latent space being represented by transformer embeddings, and it's really more along the lines of the input token embeddings just getting iteratively augmented/tagged ("transformed") with additonal grammatical and semantic information as they pass through each layer. I'm aware that there are some superposed representations, per Anthropic's interpretability research, but it seems this doesn't need to be anything more than being tagged with multiple alternate semantic/predictive pattern indentifiers.

2) I'd reserve the label "thinking" for what's being called reasoning/planning in these models, which I'd characterize as multi-step what-if prediction, with verification and backtracking where needed. Effectively a tree search of sorts (different branches of reasoning being explored), even if implemented in O1/R1 "linear" fashion. I agree that this is effectively close to what we're doing too, except of course we're a lot more capable and can explore and learn things during the reasoning process if we reach an impasse.


I am not sure how someone would change your mind beyond Anthropic's excellent interpretabilty research. It shows clearly that there are features in the model which reflect entities and concepts, across different modalities and languages, which are geometrically near each other. That's about as latent space-y as it gets.

So I'll ask you, what evidence could convince you otherwise?


Good question - I guess if the interpretability folk went looking for these sort of additive/accumulative representations and couldn't find them, that'd be fairly conclusive.

These models are obviously forming their own embedding-space representations for the things they are learning about grammar and semantics, and it seems that latent space-y representations are going to work best for that since closely related things are not going to change the meaning of a sentence as much as things less closely related.

But ... that's not to say that each embedding as a whole is not accumulative - it's just suggesting they could be accumulations of latent space-y things (latent sub-spaces). It's a bit odd if Anthropic haven't directly addressed this, but if they have I'm not aware of it.


Text embeddings are underused WRT model understanding IMO. "Interpretability" focuses on more complex tools but perhaps misses some of the basics - shouldn't we have some sort of visual understanding of model thinking?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: