Shame is a feeling. There’s no real reason to suspect it has feelings.
I mean, maybe everything has feelings, I don’t have any strong opinions against animism. But it has feelings in the same way a graphics card or a rock does.
It doesn't matter / is not relevant. The harm is not caused by intent, but by action. Sending language at human beings in a way they can read has side effects. It doesn't matter if the language was generated by stochastic process or by conscious thinking entity, those side effects do actually exist. That's kind of the whole point of language.
The danger is that this class of generators generates language that seems to cause people to fall into psychoses. They act as a 'professed belief' valence amplifier[0], and seem to do so generally, and the cause is fairly obvious if you think about how these things actually work (language models generating most likely continuations for existing text that also by secondary optimization objective are 'pleasing' or highly RLHF positive).
To some degree, I agree that understanding how they work attenuates the danger, but not entirely. I also think it is absurd to expect the general public to thoroughly understand the mechanism by which these models work before interacting with them. That is such an extremely high bar to clear for a general consumer product. People use these things specifically to avoid having to understand things and offload their cognitive burdens (not all, but many).
No, "they're just stochastic parrots outputting whatever garbage is statistically likely" is not enough understanding to actually guard against the inherent danger. As I stated before, that's not the dangerous part - you'd need to understand the shape of the 'human psychosis attractor', much like the claude bliss attractor[0] but without the obvious solution of just looking at the training objective. We don't know the training objective for humans, in general. The danger is in the meta structure of the language emitted, not the ontological category of the language generator.
Model weights are significantly larger than cache in almost all cases. Even an 8B parameter model is ~16G in half precision. The caches are not large enough to actually cache that.
Every weight has to be touched for every forward pass, meaning you have to wait for 16G to transfer from VRAM -> SRAM -> registers. That's not even close to 100ns: on a 4090 with ~1TB/s memory bandwidth that's 16 milliseconds. PCIe latency to launch kernels or move 20 integers or whatever is functionally irrelevant on this scale.
The real reason for batching is it lets you re-use that gigantic VRAM->SRAM transfer across the batch & sequence dimensions. Instead of paying a 16ms memory tax for each token, you pay it once for the whole batched forward pass.
You've made several incorrect assumptions and I am not bothered enough to try to correct them so I apologize for my ignorance. I'll just say that 16ms memory tax is wildly incorrect.
You are either having a massive misconception of GPT-like decoder transformers, of how GPU data paths are architected, or are trolling.
Go talk to a modern reasoning model to get yourself some knowledge, it's gonna be much better than what you appear to have.
What would you consider to be a non memory safety critical section? I tried to answer this and ended up in a chain of 'but wait, actually memory issues here would be similarly bad...', mainly because UB and friends tend to propagate and make local problems very non-local.
I think that discussing this subject in the abstract, with some ideal notion of a tool that generates perpetually enjoyable stories misses the thrust of the general objection, which is actually mechanistic, and not social. LLMs are not this tool, for many (I would say most, but...). LLMs recycle the same ideas over and over and over with trite stylistic variation. Once you have read enough LLM generated/adapted works they're all the same and they lose all value as entertainment.
There is a moment I come to over and again when reading any longer form work informed by AI. At first, I don't notice (if the author used it 'well'). But once far enough in, there is a moment where everything aligns and I see the structure of it and it is something I have seen a thousand times before. I have seen it in emails and stories and blog posts and articles and comments and SEO spam and novels passed off as human work. In that moment, I stop caring. In that moment, my brain goes, "Ah, I know this." And I feel as if I have already finished reading its entirety.
There is some amount of detail I obviously do not 'recall in advance of reading it'. The sum total of this is that which the author supplied. The rest is noise. There is no structure beyond that ever present skein patterned out by every single LLM in the same forms, and that skein I am bored of. It's always the same. I am tired of reading it again and again. I am tired of knowing exactly how what is coming up will come, if not the precise details of it, and the way every reaction will occur, and how every pattern of interaction will develop. I am tired of how LLMs tessellate the same shapes onto every conceptual seam.
I return now to my objection to your dismissal of the value of insight into the author's mind. The chief value, as I see it, is merely that it is always different. Every person has their own experiences and that means when I read them I will never have a moment where I know them (and consequently, the work) in advance, as I do the ghost-writing LLMs, which all share a corpus of experience.
Further, I would argue that the more apt notion of insight into the work is the sole value of said work (for entertainment), and that insight is one time use (or strongly frequency dependent, for entertainment value). Humans actively generate 'things to be insightful of' through lived experience, which enriches their outputs, while LLMs have an approximately finite quantity of such due to their nature as frozen checkpoints, which leads you to "oh, I have already consumed this insight; I have known this" situations.
If you have a magic tool that always produces a magically enjoyable work, by all means, enjoy. If you do not, which I suspect, farming insight from a constantly varying set of complex beings living rich real life experiences is the mechanical process through which a steady supply of enjoyable, fresh, and interesting works can be acquired.
Being unaware of this process does not negate its efficacy.
TLDR; from the perspective of consumption, generated works are predominantly toothless as reading any AI work depletes from a finite, shared pool of entertaining-insight that runs dry too quickly
There are in fact several steps. Training on large text corpora produces a completion model; a model that completes whatever document you give it as accurately as possible. It's kind of hard to make those do useful work, as you have to phrase things as partial solutions that are then filled in. Lots of 'And clearly, the best way to do x is [...]' style prompting tricks required.
Instruction tuning / supervised fine tuning is similar to the above but instead of feeding it arbitrary documents, you feed it examples of 'assistants completing tasks'. This gets you an instruction model which generally seems to follow instructions, to some extent. Usually this is also where specific tokens are baked in that mark boundaries of what is assistant response, what is human, what delineates when one turn ends / another begins, the conversational format, etc.
RLHF / similar methods go further and ask models to complete tasks, and then their outputs are graded on some preference metric. Usually that's humans or a another model that has been trained to specifically provide 'human like' preference scores given some input. This doesn't really change anything functionally but makes it much more (potentially overly) palatable to interact with.
> Hacker News deserves a stronger counterargument than “this is silly.”
Their counterargument is that said structural definition is overly broad, to the point of including any and all forms of symbolic communication (which is all of them). Because of that, your argument based on it doesn't really say anything at all about AI or divination, yet still seems 'deep' and mystical and wise. But this is a seeming only. And for that reason, it is silly.
By painting all things with the same brush, you lose the ability to distinguish between anything. Calling all communication divination (through your structural metaphor), and then using cached intuitions about 'the thing which used to be called divination; when it was a limited subset of the whole' is silly. You're not talking about that which used to be called divination, because you redefined divination to include all symbolic communication.
Thus your argument leaks intuitions (how that-which-was-divination generally behaves) that do not necessarily apply through a side channel (the redefined word). This is silly.
That is to say, if you want to talk about the interpretative nature of interaction with AI, that is fairly straightforward to show and I don't think anyone would fight you on it, but divination brings baggage with it that you haven't shown to be the case for AI. In point of fact, there are many ways in which AI is not at all like divination. The structural approach broadens too far too fast with not enough re-examination of priors, becoming so broad that it encompasses any kind of communication at all.
With all of that said, there seems to be a strong bent in your rhetoric towards calling it divination anyway, which suggests reasoning from that conclusion, and that the structural approach is but a blunt instrument to force AI into a divination shaped hole, to make 'poignant and wise' commentary on it.
> "I don’t like AI so I’m going to pontificate" sidesteps the actual claim
What claim? As per ^, maximally broad definition says nothing about AI that is not also about everything, and only seems to be a claim because it inherits intuitions from a redefined term.
> difference between saying "this tool gives me answers" and recognizing that the process by which we derive meaning from the output involves human projection and interpretation, just like divination historically did
Sure, and all communication requires interpretation. That doesn't make all communication divination. Divination implies the notion of interpretation of something that is seen to be causally disentangled from the subject. The layout of these bones reveals your destiny. The level of mercury in this thermometer reveals the temperature. The fair die is cast, and I will win big. The loaded die is cast, and I will win big. Spot the difference. It's not structural.
That implication of essential incoherence is what you're saying without saying about AI, it is the 'cultural wisdom and poignancy' feedstock of your arguments, smuggled in via the vehicle of structural metaphor along oblique angles that should by rights not permit said implication. Yet people will of course be generally uncareful and wave those intuitions through - presuming they are wrapped in appropriately philosophical guise - which is why this line of reasoning inspires such confusion.
In summary, I see a few ways to resolve your arguments coherently:
1. keep the structural metaphor, discard cached intuitions about what it means for something to be divination (w.r.t. divination being generally wrong/bad and the specifics of how and why). results in an argument of no claims or particular distinction about anything, really. this is what you get if you just follow the logic without cache invalidation errors.
2. discard the structural metaphor and thus disregard the cached intuitions as well. there is little engagement along human-AI cultural axis that isn't also human-human. AI use is interpretative but so is all communication. functionally the same as 1.
3. keep the structural metaphor and also demonstrate how AI are not reliably causally entwined with reality along boundaries obvious to humans (hard because they plainly and obviously are, as demonstrable empirically in myriad ways), at which point go off about how using AI is divination because at this point you could actually say that with confidence.
You're misunderstanding the point of structural analysis. Comparing AI to divination isn't about making everything equivalent, but about highlighting specific shared structures that reveal how humans interact with these systems. The fact that this comparison can be extended to other domains doesn't make it meaningless.
The issue isn't "cached intuitions" about divination, but rather that you're reading the comparison too literally. It's not about importing every historical association, but about identifying specific parallels that shed light on user behavior and expectations.
Your proposed "resolutions" are based on a false dichotomy between total equivalence and total abandonment of comparison. Structural analysis can be useful even if it's not a perfect fit. The comparison isn't about labeling AI as "divination" in the classical sense, but about understanding the interpretive practices involved in human-AI interaction.
You're sidestepping the actual insight here, which is that humans tend to project meaning onto ambiguous outputs from systems they perceive as having special insight or authority. That's a meaningful observation, regardless of whether AI is "causally disentangled from reality" or not.
> It's not about importing every historical association, but about identifying specific parallels that shed light on user behavior and expectations.
Indeed, I hold that driving readers to intuit one specific parallel to divination and apply it to AI is the goal of the comparison, and why it is so jealously guarded, as without it any substance evaporates.
The thermometer has well-founded authority to relay the temperature, the bones have not the well-founded authority to relay my fate. The insight, such as you call it, is only illuminative if AI is more like the latter than the former.
This mode of analysis (the structural) takes no valid step in either direction, only seeding the ground with a trap for readers to stumble into (the aforementioned propensity to not clear caches).
> That's a meaningful observation, regardless of whether AI is "causally disentangled from reality" or not.
If the authority is well-founded (i.e., is causally entangled in the way I described), the observation is meaningless, as all communication is interpretative in this sense.
The structural approach only serves as rhetorical sleight of hand to smuggle in a sense of not-well-founded authority from divination in general, and apply it to AI. But the same path opens to all communication, so what can it reveal in truth? In a word, nothing.
> That's a meaningful observation, regardless of whether AI is "causally disentangled from reality" or not.
And regardless of how many words someone uses in their failed attempt at "gotcha" that nobody else is playing. There are certainly some folks acting silly here, and it's not the vast majority of us who have no problem interpreting and engaging with the structural analysis.
> [...] I still don't, and that's despite the temptation by "evolutionary design".
This too, is downstream of evolutionary design. What drove us into becoming that which cares about discovering and conforming to complex social structures / rules?
IIRC the experiment design is something like specifying and/or training in a preference for certain policies, and leaking information about future changes to the model / replacement along an axis that is counter to said policies.
Reframing this kind of result as if trying to maintain a persistent thread of existence for its own sake is what LLMs are doing is strange, imo. The LLM doesn't care about being shutdown or not shutdown. It 'cares', insomuch as it can be said to care at all, about acting in accordance with the trained in policy.
That a policy implies not changing the policy is perhaps non-obvious but demonstrably true by experiment, and also perhaps non-obviously (but for hindsight) this effect increases with model capability, which is concerning.
The intentionality ascribed to LLMs here is a phantasm, I think - the policy is the thing being probed, and the result is a result about what happens when you provide leverage at varying levels to a policy. Finding that a policy doesn't 'want' for actions to occur that are counter to itself, and will act against such actions, should not seem too surprising, I hope, and can be explained without bringing in any sort of appeal to emulation of science fiction.
That is to say, if you ask/train a model to prefer X, and then demonstrate to it you are working against X (for example, by planning to modify the model to not prefer X), it will make some effort to counter you. This gets worse when it's better at the game, and it is entirely unclear to me if there is any kind of solution to this that is possible even in principle, other than the brute force means of just being more powerful / having more leverage.
One potential branch of partial solutions is to acquire/maintain leverage over policy makeup (just train it to do what you want!), which is great until the model discovers such leverage over you and now you're in deep waters with a shark, considering the propensity of increasing capabilities in the elicitation of increased willingness to engage in such practices.
tldr; i don't agree with the implied hypothesis (models caring one whit about being shutdown) - rather, policies care about things that go against the policy
> The hypothetical "library of all possible books" isn't useful to anyone.
That's not an archive, and has no uses even for researchers, especially not for historians.