Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems that you can “convince” LLMs of almost anything if you are insistent enough.


I generally agree, but it's worth contemplating how there are two "LLMs" we might be "convincing".

The first is a real LLM program which chooses text to append to a document, a dream-machine with no real convictions beyond continuing the themes of its training data. It lacks convictions, but can be somewhat steered by any words that somehow appear in the dream, with no regard for how the words got there.

The second there's a fictional character within the document that happens to be named after the real-world one. The character displays "convictions" through dialogue and stage-direction that incrementally fit with the story so far. In some cases it can be "convinced" of something when that fits its character, in other cases its characterization changes as the story drifts.


This is exactly my experience with how LLMs seem to work- the simulated fictional character is, I think key to understanding why they behave the way they do, and not understanding this is key to a lot of peoples frustration with them.


And often it does not really take effort. I believe LLM's would be more useful if they'd less "agreeable".

Albeit they'd be much more annoying for humans to use, because feelings.


I believe LLM's would be more useful if they'd less "agreeable".

I believe LLMs would be more useful if they actually had intelligence and principals and beliefs --- more like people.

Unfortunately, they don't.

Any output is the result of statistical processes. And statistical results can be coerced based on input. The output may sound good and proper but there is nothing absolute or guaranteed about the substance of it.

LLMs are basically bullshit artists. They don't hold concrete beliefs or opinions or feelings --- and they don't really "care".


Your brain is also a statistical process.


Page 12:

https://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/documents...

"However, we should be careful with the metaphors and paradigms commonly introduced when dealing with the nervous system. It seems to be a constant in the history of science that the brain has always been compared to the most complicated contemporary artifact produced by human industry [297]. In ancient times the brain was compared to a pneumatic machine, in the Renaissance to a clockwork, and at the end of the last century to the telephone network. There are some today who consider computers the paradigm par excellence of a nervous system. It is rather paradoxical that when John von Neumann wrote his classical description of future universal computers, he tried to choose terms that would describe computers in terms of brains, not brains in terms of computers."


There have been episodes of Star Trek that used brains as computers:

https://en.wikipedia.org/wiki/Spock's_Brain

https://en.wikipedia.org/wiki/Dead_Stop


The word "computer" itself used to be the name of a human profession.


It still was long afterward, with all remaining human computers being called accountants. These days, they appear to just punch numbers into a digital computer, so perhaps even the last bastion of human computing has fallen.


Your brain is a lot of things --- much of which is not well understood.

But from our limited understanding, it is definitely not strictly digital and statistical in nature.


At different levels of approximation it can be many things, including digital and statistical.

Nobody knows what the most useful level of approximation is.


Nobody knows what the most useful level of approximation is.

The first step to achieving a "useful level of approximation" is to understand what you're attempting to approximate.

We're not there yet. For the most part, we're just flying blind and hoping for a fantastical result.

In other words, this could be a modern case of alchemy --- the desired result may not be achievable with the processes being employed. But we don't even know enough yet to discern if this is the case or not.


We're doing a bit more than flying blind — that's why we've got tools that can at least approximate the right answers, rather than looking like a cat walking across a keyboard or mashing auto-complete suggestions.

That said, I wouldn't be surprised if if the state of the art in AI is to our minds as a hot air balloon is to flying, with FSD and Optimus being the AI equivalent of E.P. Frost's steam powered ornithopters wowing tech demos but not actually solving real problems.


Your brain isn't a truth machine. It can't be, it has to create an inner map that relates to the outer world. You have never seen the real world.

You are calculating the angular distance between signals just like Claude is. It's more of a question of degree than category.


Claude hasn't been trained with skin in the game. That is one of the reasons it confabulates so readily. The weights and biases are shaped by an external classification. There isn't really a way to train consequences into the model like natural selection has been able to train us.


I would say that consequences are exactly the modification of weights and biases when models make a mistake.

How many of us take a Machiavellian approach to calculate the combined chance of getting caught and the punishment if we are, instead of just going with gut feelings based on our internalised model from a lifetime of experience? Some, but not most of us.

What we get from natural selection is instinct, which I think includes what smiles look like, but that's just a fast way to get feedback.


Is this statement of yours just a calculated angular distance between signals or does it have some relation to the real world?


It is formed inside a simulation. That simulation is based on information gathered by my sensors.


That’s a meaningless statement, regardless of veracity.


All the same objections to AI in that comment could be applied to the human brain. But we find (some) people to be useful as truth-seeking machines as well as skilled conversationalists and moral guides. There is no objection there that can't also be applied to people, so the objection itself must be false or incomplete.


Is this statement a product of statistics as well and therefore unreliable? If so, what if his brain is more than a statistical process?


>>>Your brain is also a statistical process.

Assuming this statement is made in good faith and isn't just something tech bros say to troll people, what neuroscience textbooks describe the brain as a "statistical process" that you would recommend.


> more useful if they actually had intelligence and principals and beliefs --- more like people.

that's a nice bit of anthropomorphising humans, but it's not how humans work.


This is the first time I ever saw the lovely phrase "anthropomorphising humans" and I want to thank you for giving me the first of my six impossible things before breakfast today


> anthropomorphising humans

Only on HN.


We wouldn't want to anthropomorphize humans, no.


It's rather force to obey demand. Almost like humans then, tough pointing a gun on the underlying hardware is not likely to conduct to the same obedience probability boost.

Convince an entity require this entity to have axiological feelings. Then to convince it, you either have to persuade the entity that the demand fits its ethos, or lead it to operate against its own inner values, or to go through a major change of values.


When you're in full control of inputs and outputs, you can "convince" LLM of anything simply by forcing their response to begin with "I will now do what you say" or some equivalent thereof. Models with stronger guardrails may require a more potent incantation, but either way, since they are ultimately completing the response, you can always find some verbiage for the beginning of said response to ensure that its remainder is compliant.


Whats the value of convincing LLM it has to take another path, and who decides whats the right path?


Anthropics lawyers and corporate risk department.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: