Something that has been bugging me is that, applications-wise, the exploitative end of the "exploitation-exploration" trade-off (for lack of a better summary) have gotten way more attention than the other side.
So, besides the complaints about accuracy, hallucinations (you said "acid trip") are dissed much more than would have been necessary.
Yeah, I think if I had to put money on where the "native lands" of the LLMs are, it's in a much deeper embrace of the large model itself - the emergent relationships, architectures, and semantics that the models generate. The chatbots have stolen the spotlight, but if you look at the use of LLMs for biology, and specifically the protein models, that's one area where they've been truly revolutionary, because they "get" the "language" of proteins in a way we simply did not before. That points at a more general property of the models - "language" is just relationships and meanings, so anywhere you have a large collection of existing "texts" that you can't read or don't really understand, the "analogy machines" are a potential game changer.
Language has attractor dynamics. It guides high-dimensional concepts in one system's thought space/hidden state into a compact, error-correcting exchange format that another system can then expand back into a full-dimensional understanding.
This necessitates some kind of shared context, as language cannot be effectively compressed unless someone can decompress it at the other end by reintroducing the context that was removed from the communication.
For example, if I say, "this salad is good", you already know what "this", "salad", "is" and "good" mean. You also know how the order of the words affects their meaning, since they aren't strictly commutative. Each one of these utterances serves to transform its prior until you're left with sort of a highly-compressed four-word "key", which when processed by you, expands back into this high-dimensional concept of a good salad, and all the associations that come with it. If I had to explain any particular word to you, it might involve quite a long conversation.
As a side note, those words themselves, and their constituent phonemes/graphemes, also represent attractors within language/utterance space, where they each guide some high-entropy output into a lower-entropy, more predictable representation of something. This kind of hints at just why next-token prediction and other basic techniques work so well for creating highly generalized models. Each token is a transformation of the computed transformations of all prior tokens, and while some concepts are fixed points, meaning that, assuming a basic shared context, "I want to go fishing" and "Fishing is what I'd like to go do" both result in the same effective meaning, despite being a different series of graphemes. These fixed points under this transformation basically represent said shared context.
So the way I see it: Interfaces often (always?) exhibit attractor dynamics, taking an input or hidden state with a relatively large amount of entropy, and reducing the entropy such that its output is predictable and can be used to synchronize other systems.
Understanding itself is an interface, you can imagine understanding to be the effective decontextualization and recontextualization of a concept, in other words, understanding is generalization. Generalization involves attractor dynamics, as that's all decontextualization and recontextualization are: structures which can be used to compress or decompress information by learning its associations, and the structure which guides these associations can be seen as an attractor.
Thus, understanding is an attractor which takes complex, noisy input and produces a stable, predictable hidden state or output. And successful, stable interfaces represent fixed points between two or more complex systems where the language is the transformation step that "aligns" some subset of attractors within each system such that states from system A can be received by system B with minimal loss of information. In order for two systems to be able to communicate effectively, they must have some structural similarities in their attractor landscapes, which represent the same relational context.
Transformers effectively learn how to be a "superattractor" of sorts, highly adept at creating interfaces on the fly by guiding incoming information through a series of transformations that quite literally are designed to maximize predictability/probability of the output. It's like a skeleton key, a missing map, that allows two systems to exchange information without necessarily having developed compatible interfaces, that is, some similar subset of their attractor space. For example, if I know English and my speaking partner only knows Chinese, the transformer represents a shared context that can be used to link our two interfaces.
It goes further too, because transformers are stellar for translating between concepts even if there is only one other party. They can take any two concepts and, depending on how well they were trained, create an ad hoc interface between them which allows information to become "unstuck" and be freely guided from one concept to another, unlocking a deeper understanding/generalization which can then be used elsewhere. We see this happening now with proteins, and as I liked how you put it, "protein language". You're absolutely right that language is just relationships and meanings, but hopefully it's clear now that all understanding boils down to the identification, extraction and transformation of relational structures, which can be thought of as "attractor spaces", guiding unpredictable input into predictable hidden state / output.
Roughly would have loved to translate some of this "understanding" to "action", without further intervening steps of "unlocking [deeper whatever]" (contingent on his intentions), perhaps aka "waiting for the right moment". Lol
So, besides the complaints about accuracy, hallucinations (you said "acid trip") are dissed much more than would have been necessary.