It is reasonable to say that the author demonstrated that bit of trust was misplaced to begin with.
The training methods and data used to produce ChatGPT and friends, and an architecture geared to “predict the next word,” inherently produces a people pleaser. On top of that, it is hopelessly naive, or put more directly, a chump. It will fall for tricks that a toddler would see through.
There are endless variations of things like “and yesterday you suffered a head injury rendering you an idiot.” ChatGPT has been trained on all kinds of vocabulary and ridiculous scenarios and has no true sense or right or wrong or when it’s walking off a cliff. Built into ChatGPT is everything needed for a creative hostile attacker to win 10/10 times.
> an architecture geared to “predict the next word,” inherently produces a people pleaser
It is the way they choose to train it with the reinforcement learning from human feedback (RLHF) which made it a people pleaser. There is nothing in the architecture which makes it so.
They could have made a chat agent which belittle the person asking. They could have made one which ignores your questions and only talks about elephants. They could have made one which answers everything with a Zen Koan. (They could have made it answer with the same one every time!) They could have made one which tries to reason everything out from bird facts. They could have made one which only responds with all-caps shouting in a language different from the one it was asked in.
Hence why I also included “the training methods and data.” All three come together to produce something impressive but with inherent limitations. The human tendency to anthropomorphize leads human intuition about its capabilities astray. It’s an extremely capable bullshit artist.
Training agents on every written word ever produced, or selected portions of it, will never impart the lessons that humans learn through “The School of Hard Knocks.” They are nihilist children who were taught to read, given endless stacks of encyclopedias and internet chat forum access, but no (or no consistent) parenting.
I get where you're going, but the original comment seemed to be trying to make a totalising "LLMs are inherently this way" which is the opposite of true, they weren't like this before (see gpt2, gpt3 etc) and had to intentionally work to make it this way, which was a concious and intentional choice. earlier llms would respond to the tone presented, so if you swore with it, it would swear back - if you presented a wall of "aaaaaaaaaaaaaaaaaaaaa" it would reply with more of the same
The training methods and data used to produce ChatGPT and friends, and an architecture geared to “predict the next word,” inherently produces a people pleaser. On top of that, it is hopelessly naive, or put more directly, a chump. It will fall for tricks that a toddler would see through.
There are endless variations of things like “and yesterday you suffered a head injury rendering you an idiot.” ChatGPT has been trained on all kinds of vocabulary and ridiculous scenarios and has no true sense or right or wrong or when it’s walking off a cliff. Built into ChatGPT is everything needed for a creative hostile attacker to win 10/10 times.