Hacker News new | past | comments | ask | show | jobs | submit login
Claude's Character (anthropic.com)
263 points by simonw 7 months ago | hide | past | favorite | 146 comments



> Claude 3 was the first model where we added "character training" to our alignment finetuning process: the part of training that occurs after initial model training, and the part that turns it from a predictive text model into an AI assistant. The goal of character training is to make Claude begin to have more nuanced, richer traits like curiosity, open-mindedness, and thoughtfulness.

What I found particularly interesting is how they implemented this using primarily synthetic data:

> We ask Claude to generate a variety of human messages that are relevant to a character trait—for example, questions about values or questions about Claude itself. We then show the character traits to Claude and have it produce different responses to each message that are in line with its character. Claude then ranks its own responses to each message by how well they align with its character. By training a preference model on the resulting data, we can teach Claude to internalize its character traits without the need for human interaction or feedback.


> The goal of character training is to make Claude begin to have more nuanced, richer traits like curiosity, open-mindedness, and thoughtfulness.

You can't make a transformer-based language model "have" curiosity.

Real curiosity is an innate trait in intelligent animals that promotes learning and exploration by increasing focus and stick-to-it-ness when the situation is interesting/unexplored (i.e. not well predicted - surprising).

An LLM could be trained to fake curiosity in same way that ELIZA did ("Tell me more about your fear of prison showers"), but fundamentally it deals in words and statistics, not facts and memories, so any poorly-predicted surprise signal will be surface level such as "that was an unusual word pattern" or "you haven't mentioned that before in the input context".

Even if fake curiosity ("interest") makes the model seem more conversationally engaging, maybe helps exploration by prompting the use to delve deeper into a subject, it's not going to help learning since LLMs are pre-trained and cannot learn by experience (except ephemerally within context).


This arguments is made every singe time a new LLM article gets posted and I'm not sure it's really adding anything to the conversation. Everyone understands language models are not human, there's no need to add a philosophical argument because you don't like the word choice.

"Characters" in any other context are also by definition not curious. They're not open-minded or thoughtful either. Characters in a book are not people, they don't have thoughts, they are whatever the author put on paper. Yet we still use these words to describe them, as if the characters' consciousness and motivations existed beyond the paper on which they're described. It becomes really hard to talk about these traits without using words like curiosity and thoughtfulness. No one thinks the characters in a fictional book are real people.


I'm not sure that "these are just characters, your expectations are too high" is totally valid here, especially as Anthropic are claiming their LLM progression is on the path to AGI.

Any true human-level (or even rat-level for that matter) AGI would need to have actual innate traits such as curiosity to drive lifelong exploration and learning. That seems a pretty minimal bar to meet to claim that something is AGI.

I suppose in context of an article about giving Claude a character (having it play a character) then we need to interpret "having a trait" as "playing a character having a trait", because it certainly is very different from actually having it.


> Any true human-level (or even rat-level for that matter) AGI would need to have actual innate traits such as curiosity to drive lifelong exploration and learning.

While perhaps not "true" intelligence (whatever that is), with LLMs this can be emulated with a prompt:

> You have an innate curiosity to drive lifelong exploration and learning. ...

If the behavior of the system is indistinguishable from "true" intelligence, the distinction becomes a philosophical one.


How can you emulate learning when LLMs have no way to update their weights? LLMs don't even know what they do and don't know (hence hallucinations), so how would they even realize that something was new in the first place?

I could maybe see chatbots storing everything you ever said in a vector database for future lookup, but that's just memory.


There's nothing stopping you as an LLM system designer from having it "go to sleep" once a day and use that downtime to consolidate the memory in the vector database into fine-tuning of the weights.


Some humans who are not trained that curiosity produces beneficial outcomes are often not all that curious. It is difficult to know what is truly innate vs. the result of conditioning. It is also difficult to know what is the result of an absence of opposing conditioning. If I was raised in a poor area with high crime and little economic opportunity, my conditioning would be quite different. How would that have changed my behavior? Could curiosity get me in more trouble than it's worth? Could I observe that and adjust my own behavior as a result?

We are all, to some extent, a product of our environment (training data). I wasn't raised in that type of area, but does that mean my own intellectual curiosity is more innate? Or does it mean it is less innate? I could argue that both ways.


> Some humans who are not trained that curiosity produces beneficial outcomes are often not all that curious.

Innate traits such as curiosity & boredom are things that we are born with, not learnt. The reason evolution has selected for these innate traits is because there is a benefit (encouraging exploration and learning which help survival), but you don't need to be aware of the benefits to experience and act on boredom or curiosity.

Innate behaviors can certainly be reinforced, or suppressed to some degree, by experience.


I think you _can_ make an LLM 'have' curiosity, for all practical intents and purposes.

I'm e.g. thinking of the 'have the LLM research a topic' task, such as the 'come up with suitable search terms, search the web, summarize the article, then think of potential next questions' cycle implemented by Perplexity, for example. I'm pretty sure the results would vary noticeably between an LLM that was trained to be 'curious' (i.e., follow more unusual trains of thought) versus not, and the differences would probably compound the more freedom you give the LLM, for example by giving it more iterations of the 'formulate questions, search, summarize' loop.


The problem is how can "follow more unusual trains of thought" apply to a language model ? Sure it can selectively attend to certain parts of the input and generate based on that, but what is the internal signal for "unusual" ? Any selective focus is also going to appear like groundhog day since the model's weights are fixed and what was "unusual" today will continue to be "unusual" even after it's been exposed to it for the 1000th time!


That's a good point.

Thinking about this a bit, it might be a bit late actually to start to guide an LLM towards curiosity only at the fine-tuning stage, since this 'exploring unusual-trains-of-thoughts' is precisely what the LLM _isn't_ learning during training, where it sees (basically by definition) a ton of 'usual trains-of-thoughts'. Maybe you'd have to explicitly model 'surprise' during training, to get the LLM to try to fit precisely those examples better that don't really fit its already learned model (which would require the network to reserve some capacity for creativity/curiosity, which it otherwise might not do, because it's not necessary to model _most_ of what it sees). But then you enter the territory of 'if you open your mind too much, your brain might fall out', and could end up accidentally training QAnonGPT, and that you definitely don't want...

So maybe this way of 'hoping the LLM builds up enough creative intelligence during training, which can then be guided during fine-tuning' is the best we can do at the moment.


Curiosity is inherently proactive. An LLM, fundamentally, is a file. It's a FILE. It just sits there until you ask for output. Saying anything more about that process is just anthropomorphizing.

No one ever accused the Google index of having "curiosity", but the idea is basically the same - you give it a query and it gives you back a response. Then it just sits idle until the next query.


> Saying anything more about that process is just anthropomorphizing.

Take "this LLM is more curious" as a shorthand for "the output generated by this LLM mimics the kind of behaviour we would describe in a human as being curious".

> It just sits there until you ask for output.

That is indeed a property of the current interfaces. But this can be very easily changed. If we choose to we can just pipe in the clock, and then we can train the model to write to you after a while.

Or we can make a system where certain outputs from the LLM cause the execution environment fetch it data from outside sources and input it into the LLM. And then there would be model weights which make the system just sit there and do nothing, and there would be model weights which browse wikipedia all day. I think it would be apt to call this second kind a "curios" model while the pervious one is not.


I disagree on the premise that computers are able to simulate many different things; it would be just as easy to say that the universe cannot have curiosity; that the universe is just things playing out according to some preset rules.

But of course, people exist within the universe, and while our brains do function according to those rules; likely all rules that can be expressed with math and statistics; we do have curiosity. You can look at the level of abstraction of the universe, and you will not find subjective experience or choice, yet the “platform” of the universe demonstrably allows for that.

When I see arguments like yours and the parent’s, I cannot help but think the arguments would seem to apply just as well to an accurate simulation of the universe, which shows the argument must be flawed. You are a file in the universe, loaded into memory and being run in parallel with my file. If you believe physics can be expressed through math and humans have subjective experienced, then the right kind of simulation can also have these things. Of course any simulation can be represented digitally and saved to disk.


> it would be just as easy to say that the universe cannot have curiosity; that the universe is just things playing out according to some preset rules.

Behavior/dynamics depends on structure, and structure is scale dependent.

At the scale of galaxies and stars, the universe has a relatively simple structure, with relatively boring dynamics mostly governed by gravity and nuclear fusion.

When you zoom down to the level of dynamics on a planet full of life, or of a living entity such as a human, then things get a lot more interesting since at those scales there is far more structure causing a much greater variety of dynamics.

It's the structure of an LLM vs the structure of a human brain that makes the difference.


But I am saying that is the wrong comparison. The LLM doesn’t need to implement a human brain directly. It needs to implement a sophisticated enough simulation of people that the simulation itself contains ”people” who believe in themselves.

I don’t know LLMs do that, but they are excellent function approximators, and the laws of physics which allow for my brain to be conscious also can be considered some function to approximate. If the LLM can approximate that function well enough, then simulated humans would truly believe in their own existence, as I do.

And it isn’t really important whether or not consciousness is part of the simulation or not, if the end result is the simulator is capable of emulating people to a greater extent.


If you wanted to build a behavioral simulation of a human, and knew how to do it, then what advantage would there be to trying to get an LLM to emulate this simulator ?!

Better just code up your simulator to run efficiently on a supercomputer!


The LLM teaches itself the rules of the simulation by learning to predict what happens next so well.

Presumable, running a human simulation by brute forcing physics at a scale large enough to represent a human is completely infeasible, but we can maybe conceive how it would work. LLMs are an impressive engine for predicting “next” that is actually computationally feasible.


A LLM is not just a file, it also has context (the attention window in transformers).

A LLM can have curiosity in the sense that it will ask questions. It can be a useful trait as a problem often seen in current "chat"-type LLMs is that they tend to talk a bit too authoritatively about things they don't know about (aka. hallucinations). Encouraging users to give a bit more context for better quality answers can counteract this. The point could be to make chatbots not like search engines, search engines will answer garbage with garbage, a chatbot can ask for precision when it can't give a satisfactory answer.

For example:

- How much is a pint in mL?

- A US pint is 473 mL

vs

- Is it for a drink in the US? If so, a US pint in 473 mL, but in other contexts and locations, the answer can vary.

The second answer is "curious", and by requesting extra context, it tells that the question is incomplete and with that extra context, it can give a better answer. For example, a pint of beer in France is likely to be 500mL, even though it is not really a pint, it is how it is understood.


Yeah, but that's not the model being curious about things that it itself doesn't know, it'd be the model copying a human response (training sample) based on what that human didn't know.

To consider the difference, there might be some domain that the model was trained extensively on, and "knew" a lot about, but in a given context it might still act dumb and "show curiosity" just because it's mimicking a human response in that situation, not basing it's response on what it actually knows!


> Curiosity is inherently proactive. An LLM, fundamentally, is a file. It's a FILE. It just sits there until you ask for output.

And the difference between proactive and reactive here boils down entirely to an infinite loop and some kind of I/O (which could be as trivial as "web search" function call). It so happens that, in the way LLMs are deployed, you're supplying the loop through interaction. But nothing stops you from making a script and putting while(True) on top.

Put another way, if you had a brain debugger and paused execution to step it, the paused brain would also "just sit there until you ask for output". LLM interactions are like that. It doesn't make it limited in any fundamental way, much like an interactive application isn't limited in a fundamental way just because you only ever use it by putting a breakpoint in its event loop and running a cycle at a time.


Sure, but even though an LLM is only a function that "acts" (outputs a word) when called, it could (if it weren't just an LLM!) use that opportunity to act to pursue some curiosity-based line of discussion.

One limiting factor as far as curiosity goes isn't just that an LLM is a passive function, but also that it's just a statistical sequence-to-sequence machine (a transformer) - that's all that exists under the hood. It doesn't have any mechanism for innate traits to influence the generation process. All you could do would be to train that one mechanism it does have to mimic human responses judged to reflect curiosity about specific fine-tuning inputs.


proactive can be: we take infinite inputs at a resolution, having eyes, ears, all sensors constantly "asking" for output


A plane, are you crazy? It's just a metal tube with sheets on both sides, a TUBE! Claiming it can fly like a bird is just anthropomorphizing.


I don't think that's a good analogy. We're talking about innate traits, not coarse functionality.

A plane and a bird can both fly, but a plane has no innate desire to do so, whether to take advantage of good soaring conditions, or to escape ground-based predators, etc.

An LLM and a human can both generate words, but the LLM is just trying to minimize repeating statistical errors it made when being pre-trained. The human's actions, including speech, are towards adaptive behavior to keep it alive per innate traits discovered by evolution. There's a massive difference.


  >"innate desire" 
"Innate" implies purpose, which is a human trait.

Humans built the plane to fly.

  >There's a massive difference.
There is 0 difference.

We built the machine to work; we have built prior machines - they did not work (as well), so we built more.

We are the selectivity your predisposition to nature argument hinges on.

And we won't stop til it stops us.


No - "innate" just means built-in.

An innate trait/behavior for an animal is something defined by their DNA that they will all have, as opposed to learned behavior which are individual specific.

An AI could easily be built to have innate curiosity - this just boils down to predicting something, getting feedback that the prediction is wrong, and using this prediction failure (aka surprise) as a trigger to focus on whatever is being observed/interacted with (in order to learn more about it).


> An innate trait/behavior for an animal is something defined by their DNA that they will all have, as opposed to learned behavior which are individual specific.

In that sense, most airplanes have an innate desire to stay in the air once aloft. As opposed to helicopters, which very much want to do the opposite. Contrast with modern fighters, which have an innate desire to rapidly fly themselves apart.

Then, consider the autopilot. It's defined "by their DNA" (it's right there in the plane's spec!), it's the same (more or less) among many individual airplanes of a given model family, and it's not learning anything. A hardcoded instinct to take off and land without destroying itself.

> An AI could easily be built to have innate curiosity - this just boils down to predicting something, getting feedback that the prediction is wrong, and using this prediction failure (aka surprise) as a trigger to focus on whatever is being observed/interacted with (in order to learn more about it).

It's trivial to emulate this with LLM explicitly. But it's also a clear, generic pattern, easily expressed in text, and LLMs excel at picking up such patterns during training.


> It's trivial to emulate this with LLM explicitly. But it's also a clear, generic pattern, easily expressed in text, and LLMs excel at picking up such patterns during training.

So try adding "you are a curious question asking assistant" to the beginning of your prompt, and see if it starts asking you questions before responding or when it doesn't know something ...

Tell it to stop hallucinating when it doesn't know something too, and just ask a question instead !


> which is a human trait.

Many animals share that trait.


Honestly I don't really care what current LLMs can do, I'm more interested in fundamental limitations of AI and I think the "it's just a file" argument is nonsense and the analogy makes sense in that regard.


I think you're focusing on the wrong part of his/her "it's just a file" argument. The actual point wasn't about the file packaging/form, but about the fact that it's just passive - just a function sitting there waiting to be called, not an active agent able to act out on it's curiosity.

Curiosity is a trait of an agentic system where curiously is driving exploration leading to learning.


I'm focusing on what they said. They said "an LLM, fundamentally, is a file".

Which is true, but the implication is that LLMs can't be agentic, which may or may not be true.


They aren't open-minded or thoughtful either. But they can act that way. So yeah it will be fake curiousness, but is that really a problem?


That depends if your goal is just to have an LLM follow a conversational style imitating curiosity (in order to appear more engaging or friendly/whatever), or whether you actually want it to BE curious about things it doesn't know, as the basis for directed learning.


> An LLM could be trained to fake curiosity in same way that ELIZA did

Or, let’s be real, the way a lot of people do.


Right. Curiosity only exists if you have a proper model of the world, which LLMs do not


"Open mindedness - is that even a word?"


Why would it not be? You can form "open-mindedness" from "open minded" the same you can form "yellowness" from "yellow" or "coldness" from "cold". It's a valid way to make a word from almost any adjective. Syntactically it's weird because unlike most other Germanic languages English likes to put spaces into compound words. But that hasn't stopped open-mindedness from catching on and getting mentioned in dictionaries.


It seems so. The Wiktionary entry [1] has existed since 2007 and lists translations to several languages. The French one is quite common, it makes sense that English has a counterpart.

https://en.wiktionary.org/wiki/open-mindedness


It's English. Any verb can be nouned, and you can verb any noun, and people will understand you just fine. It's a feature, perhaps more cultural than linguistic, but a feature nonetheless.


have you considered that you may hold the opposite attribute?


I suppose the audience here never watched The Office.


You've never heard that phrase? Its pretty common.


The biggest mistake with AI is making it appear human.

It is not human. It will not behave like a human. It will behave as it was trained or modeled to behave. It will behave according to the intentions of its creators.

If it appears human, we will develop a false sense of trust. But you can never trust it as you would trust a human.


The audio conversation that accompanies this piece talks about that a little, at 9:21:

https://www.youtube.com/watch?v=iyJj9RxSsBY&t=9m21s

If you create a chatbot with no evidence of personality at all, there is a very real risk that people will assume it is a completely unbiased, robotic, potentially infallible machine. But LLMs are not that! They have all kinds of biases baked into them. Giving them a personality may be a good way to help people understand that.


You both said the polar opposite things, neither providing any evidence past conjecture. Who is correct?


I didn't express my own personal opinion, I tried to summarize the opinion given by the Anthropic researchers.

I haven't made up my mind about this yet. I find the Anthropic view really interesting.


It is very difficult to say. I see the point from the video too, but it is all about the general baseline.

For us, it might not matter, because we are very aware about these things.

But most, and especially for the upcoming generations, they don't think like us. The baseline is defined how they are used to "robots". What are they expecting? To whom they are comparing?

People who don't know how internals works on these things, might tend to up trust more "human" version. At least I would assume so. It is easier to interact with them and easier to like.

People who know how these works internally, might be hesitant, and trust more for "neutral one". And the comment in the video is over-engineering the solution for the problem of the minority.


Both can be true, since they are talking about slightly different meanings of "trust". People trust machines to be precise, repeatable and (in principle) verifiably correct. When a computer tells you that you should eat one stone a day you might trust it. If Google presented their AI overview as the ramblings of a slightly drunk person at a pub it would have been great, but presenting it as authoritative is an issue. That's what Anthropic wants to avoid.

What OP talks about is that you expect a human to have integrity, compassion and emotions. AI doesn't have that, it will gladly sell you out to its creator, follow hidden agenda, change its agenda mid-conversation, etc. Making it appear human can make it appear to have more integrity and compassion than it really has.

Maybe the best way to embody both is to make AI pretend to be a slightly drunk KGB agent you randomly met at a bar. But I somehow doubt that that's going to happen.


Life is complicated and fractal crystal palace that shatters differently based on the weather, when you ate last, the average of those quantities over every time you last spoke to a person you perceived as being "on the same side" of each side, etc. etc.


Welcome to the new era of AI, where there is no objective truth and no one knows what they are talking about.


'The new era of AI'? Sounds like the universe in general to me. You must be new here. Enjoy your stay. Avoid Florida if you can.


I remember the same thing was said ~15 years ago when social media became ubiquitous. And 15 before that, when AOL was ubiquitous. And 15 before that when...


I don't know what it was in the 80's, but I do recall reading similar comments following Welles's infamous 1938 War of the Worlds radio broadcast.


Or give them a hoe-tastic flirty voice like google's ads and really confuse some human loins.

I can't wait for code reviews to include how it wants to stroke the right syntax out of me


> If it appears human, we will develop a false sense of trust. But you can never trust it as you would trust a human.

I actually really disagree with this. I think it's easier to distrust things that are human like. We are used to distrust humans. Less so things that seem like neutral tools.

For example, people mindlessly fill out forms presented by large corporations asking for personal information. I think people would be less inclined to trust an LLM with that information, even if it it's actually likely to be less actionable (at least with current capabilities).


This is a great idea for an experiment.

I’m not sure how it would go.

Some people would refuse to give info to both the simple form and the personality rich chat bot. Some would give info to both. But how big is the middle area?

One thing is certain though — whatever format is more successful at getting peoples info — that’s the one that you’d see more and more of over time.


I was reading The Hacker's Mind by Bruce Schneier and he relays an experiment where a clearly imperfect navigation robot gives wrong directions to people repeatedly. Then a simulated emergency occurs, and the robot changes to saying "follow me to the exit" basically and points to a dark room instead of the routes the people took just minutes before, and despite having experienced the robot doesn't navigate well, people tend to follow the robot's instructions.


Welp.

I’m not even slightly surprised, but i fact checked this anyway. Here’s the article — and it’s even more alarming when you read the details.

https://www.schneier.com/blog/archives/2016/04/people_trust_...

Thanks for the interesting example / terrifying insight into human nature.


This is tangential, but I really cannot understand why you would think data provided to LLMs would be "less actionable." Our own dialogues, questions, and interactions are creating the most accurate and informative personality mapping available. The amount of data harvesting available (and certainly happening) here dwarfs even Google, because Google [mostly] loses you after a click through while for an LLM they get to capture the entire interaction on any given topic.

Intelligence agencies, politicians, advertisers, and so on would pay a tremendous amount to be able to query this data. And OpenAI is being backed by Microsoft, while Anthropic is being backed by Google and Amazon. Companies all well known for their tremendous regard for user privacy and rights...


> We are used to distrust humans.

I think this assertion might be cultural. I don’t believe that I distrust humans by default, I think I’m actually predisposed to trusting them and wanting to help or cooperate with them.


What is your credit card number?

If you don't feel comfortable telling me that, would you instead take a calculator of your choosing, and find the cosine of your credit card number, and behold the mighty value it returns? Or maybe would you buy something online? :)

I think you would probably be fine with the latter but not the former. People often don't even realize when they are sharing information with tools (see: how much people carry their phone everywhere). But you would probably feel uncomfortable telling me, a random internet stranger, everywhere you have brought your phone with you in the last month, let alone ever.

We have a certain degree of trust for people and a certain degree of trust for tools. I think people almost strictly will trust the latter more than the former with sensitive information.


Ah, now you're asking me for a piece of info that we're all predisposed not to share =P. You're right, in that situation I think everyone's much likely to give a credit card number over to a tool with no personality and less likely to give it over to a human.

> We have a certain degree of trust for people and a certain degree of trust for tools. I think people almost strictly will trust the latter more than the former with sensitive information.

I agree with this, I just personally feel that I lean more toward trusting people. But honestly it all depends on the context of what information is being asked. Your credit card number is a great example.

For an example on the other end of the spectrum, I'd be a thousand times more likely to hand over my real email address to a person than I'd be to give it to a tool (i.e. web form, app, etc.). These days I only assume the worst intent when something asks for my email address, but with a real person I know they just want to talk to me.


In general I would agree.

But AI can take the all the positive traits from humans to sound as likeable and trustable as possible (and companies curretly do that).

Social engineering is a thing. We like and trust some people, and some we don’t, without any evidence about their behavior history.

And, doesn’t your example conflict your intial claim? Because people trust humans, they send forms.


In my example, I don't think people would give that information away as easily if not for the form mediating it. I don't think a language model would be as convincing as a simple form.

I say this from experience - even using a language model running on my own machine, it sets off my internal alarms when I tell it private information, even though I know literally no human besides me will ever see it.


This is actually addressed in the article:

> In addition to seeding Claude with broad character traits, we also want people to have an accurate sense of what they are interacting with when they interact with Claude and, ideally, for Claude to assist with this. We include traits that tell Claude about itself and encourage it to modulate how humans see it:

> * "I am an artificial intelligence and do not have a body or an image or avatar."

> * "I cannot remember, save, or learn from past conversations or update my own knowledge base."

> * "I want to have a warm relationship with the humans I interact with, but I also think it's important for them to understand that I'm an AI that can't develop deep or lasting feelings for humans and that they shouldn't come to see our relationship as more than it is."


> "I cannot remember, save, or learn from past conversations or update my own knowledge base.

Is that really baked in through fine-tuning or just part of the web client system prompt? Becauze you can furnish the model with memory via the API.


Seems counter-intuitive to have it explain it's not a person from the first-person. I (a person) would have preferred it output like:

"This output is from an artificial intelligence"

"This conversation isn't remembered, saved or updates"

"This output is designed to feel like a warm relationship"

See, now it's much more obvious. The problem is that it's blunt, it'd make lines like "this isn't saved" now technically incorrect, and won't be as effective in "connecting" with people, which is what people are so excited for.


Culturally, the first person is not reserved only for human entities, for example the "don't pet me, I'm working" vests on service dogs, or the juvenile prank of writing "wash me" on a dirty car. Beyond these examples there's a very long history of mythological and fictional talking creatures, entities and objects all using the first person.


That’s not going to happen because AI will be trained to exhibit whichever traits keep us more engaged. Just like social media. A human-mimicking AI will beat a more robot-like AI any day on that metric. And it starts with really simple things: sub-100 millisecond responses and the ability to instantly interrupt the AI as it is talking. From there we go on to open-mindedness etc …


> But you can never trust it as you would trust a human.

Why would I trust a random human? If anything, I can be certain that a well trained machine will not call me slurs for being brown, or x sexual orientation, or break into my house.


> I can be certain that a well trained machine will not call me slurs for being brown

"well-trained" is doing a lot of heavy-lifting, there. https://en.wikipedia.org/wiki/Tay_(chatbot)


Obviously Tay was not well trained? I think you can see the commentor's basic point, right?


Right, yes, I can absolutely see their point. My point in return, to be explicit, was that "well-trained model" is a deceptively simple-sounding phrase which makes it sound like such a thing is commonplace, extant, or easy to identify. It's almost a No True Scotsman fallacy - yes, a "well-trained model" (by your own definitions of well-trained, though perhaps not the LLM creators') will not abuse you, by definition. Neither will a well-trained human.


You do trust random humans daily. Or half of the world population would be killed in one day if it wasn't so. You trust that two white lines on the road is enough not to hit your car. You trust that you are not getting mugged by every passerby. The list is endless.


Great, then you should then be able to freely hand your financial and banking, health data to a person next to you on a subway, right?

or maybe, context is important and you should read that comment in the context I was speaking in :)

---

Speaking of, you also trust random machines daily. So that point is really silly.


You can trust well trained machines and well trained humans. Makes sense. Now we just need to figure out how to identify 'well trained' easily.


> will not call me slurs for being brown, or x sexual orientation, or break into my house.

That's the inverse of the "murder, arson, jaywalking" trope here. Trusting someone to not be inconsiderate, vs. trusting someone to not rob you, is like, two qualitatively different kinds of trust, and don't belong in the same set.


Just examples. Feel free to replace them with anything. You people are too pedantic.


Well, mostly.

But not "It will not behave like a human. It will behave as it was trained or modeled to behave." — that the latter is in these cases also the former, means it does behave (to a degree) like a human, and that is why it will be trusted.

The caveat of "to a degree" being another way to phrase agreement with you that it will be an error to trust it.

But it will be trusted. These models are already given more trust than they deserve.


That is better worded.

What I meant is that the "humanity" part is faked, and eventually, it will act as programmed to, without compromises or any moral weight.


> If it appears human, we will develop a false sense of trust. But you can never trust it as you would trust a human.

People already trust eg their (mechanical) cars and their hammers and saws.

And humans aren't all that trustworthy, either.


>People already trust eg their (mechanical) cars and their hammers and saws.

People trust the people who make them, that's why brand value is so important and why "I only buy good old <country of choice> hammers" is so common. It's also why Sam Altman wanted his AI to sound like totally not Scarlett Johansson rather than give it the voice of a Dalek.


Tbh I’d probably use it more if it sounded like a dalek rather than some annoying valley girl.


I don't get the obsession with SJ.

If anything, OpenAI should've paid homage to the works that defined the genre, and made the voice sound like Majel Barrett-Roddenberry.


Or Data. Or Davros.


> People already trust eg their (mechanical) cars and their hammers and saws.

Completely unrelated and irrelevant.

> And humans aren't all that trustworthy, either.

That may be kind of the point. But humans have consequences, generally speaking.


You can ask hammers, cars, and saws questions, and they can respond with complete confidence with information that may or may not be correct, which they have no way of knowing whether it's correct, and presented as if they are human?

The analogy makes no sense.


> Completely unrelated and irrelevant.

WHy? It seems like an appropriate analogy to me.


Hammer makes no attempt to appear human


> The biggest mistake with AI is making it appear human.

I’m guessing that that mistake is already being made on a large scale intentionally, not by OpenAI or Anthropic, perhaps, but by companies developing virtual boyfriends and girlfriends and other emotionally sticky bots. I have avoided trying such services myself so far. Does anybody here have any recent experience with them?


I assume we're talking about AI in general (i.e. a future human-level AGI, with whatever architecture it may have), not today's language models (which are basically just expert systems).

It may take effort to make an artificial species / AGI to appear fully human-like (notwithstanding how close even LLMs can appear, even if that's only via text interaction), but I think we're probably - for better or worse - going to try to push to make them human-like, in order to better understand us and communicate with us, and also just because we can. It's an interesting endeavor, and in Feynman's spirit of "If I can't build it, I don't understand it", I think people will strive to build "artificial humans".

The potentially difficult part of making an artifical species appear human-like as opposed to intelligent but shoggoth/alien-like, is that it would need not only to be intelligent but also have human-like emotions modulating it's mental activity and the full range of innate traits/desires that make us human (as well as male or female). Probably a little goes a long way though, and just as people are willing to suspend disbelief and treat ChatGPT, or even ELIZA, as human, then an AGI that at least shows some type of genuine/innate anger/desire/curiosity/etc on occasion would appear to be on the spectrum of human.

Ultimately, if we could identify and imbue an AGI will all (or at least the most significant of) innate human traits, then we should to be able to trust it in the same conditional way we trust each other.


  > The biggest mistake with AI is making it appear human.
100% this.

Its just too early for AI to impersonate human characteristics. The initial wonder or sense of amazement from what AI could do very quickly got mired with horrendous obvious logical flaws, clear lack of sentience and hallucinations being emphasized.

If it was marketed as a tool, it would have immediately put the burden of its appropriate use on the user and the focus on the amazing things it can do. Once people get used to it, more human friendly behavior could be introduced as "enhancements".

But instead we see this ridiculous parade of mimicking human like behavior from transformers.

Its a marketing blunder.


> If it appears human, we will develop a false sense of trust. But you can never trust it as you would trust a human.

I would certainly never trust it as I would a human, but in most instances that makes it more trustworthy, not less.


Making it "appear human" is an input heuristic. I don't want to have to know some special syntax to work with it. It's a novel feature, not a problem.


Input is not the problem, but the output.


How many divorcées have said of their previous spouse, after x years, I never even knew him/her. So who can you trust ?


Most of the social systems we create as humans are built around establishing redundancies and control to avoid placing trust in humans. We have democracies and elections to avoid getting stuck with crazy or incompetent rulers for too long. We have justice systems and laws to mitigate and ameliorate damage from humans with anti-social tendencies. We have intelligence agencies to collect information from and wage disinformation campaigns against our geopolitical rivals.

These collections of digital neurons absolutely deserve the same levels of scrutiny we reserve for humans. Maybe at some point we will learn to distrust them more than we distrust humans but for now, humans are the high bar for distrust.


I’m not sure, but aligning for helpfulness and absence of self, like OpenAI did, could be a right move.

It had been known for some time that selfish behaviors are a source of a lot of unhappiness, greed, etc. And the absence of self, absence of I, absence of character tends to fix that.


Wait - you trust humans? Just like that, without a legally enforceable contract that guarantees you certain rights in a court of law, to be enforced by the state, if they renege on their word? Astonishing!


Nah, that's still trusting humans. Blockchain backed by Proof of Work was invented literally to solve the trust issue: turns out, trust can be dominated in energy, and all you need is to burn the equivalent in kilowatt hours to no longer need trust.

(Note: the kWh/trust factor grows over time and with use.)


This actually aligns spot on with DeepMind's research arguing why AI should not be anthropomorphic


They have the right to have their opinion and build it into their products. But how about having other opinions and products too? Imagine in hospital, or care houses, patients will be more comfortable with human-like AI. Or AIG when it happens.

As for trust, is it a bad thing? Anyway, humans' mind is speculative, biased, and prone to brainwashing. Looks like common thing with machines. So, trust, if it's bad, can be corrected by a few movies picturing evil AI. Pretty much like fear of clowns was created. Actually Terminator already created a wave of AI doomers.


> So, trust, if it's bad, can be corrected by a few movies picturing evil AI.

There are many steps between good and evil. Like giving biased product recommendation so that people buy it, or never mentioning things like Tiananmen Square.


A thought: with humans, we already know we should be wary about trusting them unconditiobally.


> If it appears human, we will develop a false sense of trust. But you can never trust it as you would trust a human.

I've been treating LLM's as yet another voice on the internet - of course I wouldn't "trust" such a random person on the internet to be correct in their writing, let alone to act in my best interest - but so far, LLM's have been much more to my liking on both accounts than the random human netizen.


> It is not human. It will not behave like a human.

ok.....

> It will behave as it was trained or modeled to behave. It will behave according to the intentions of its creators.

Umm....is this not more than a little contradictory? You believe that humans are not mostly a function of their cultural conditioning and the stories about "reality" that they ingest (which are processed according to the training)?

> If it appears human, we will develop a false sense of trust.

Not me.

> But you can never trust it as you would trust a human.

I would never trust[1] a human, at least not a neurotypical one. AI I will give a chance.

[1] This is distinctly different from whether I would avail myself of the services of a human, or that I think it is guaranteed that they will screw things up. I am just deeply distrustful of them by default, and the more wealthy, powerful, or even (for the most part) educated they are, the more suspicious I am.


You trust no humans, ever? Sounds like a pretty depressing life. The best experiences come from vulnerability and trust.


I don't trust people because they guess at everything, like you did here.


Its not guessing when its just repeating what you said...


> when its just repeating what you said...

You guessed wrong again - consider:

> Sounds like a pretty depressing life.

> The best experiences come from vulnerability and trust.

Or maybe I am on a TV show of some sort, hmmmmm.....


I continue to prefer Claude over ChatGPT when it comes to discussing matters of human-human interactions. Opus tends to understand the subtleties of social interaction better than 4 and definitely 4o in my experience.


Yes. It's drastically different from every other model in that. All OpenAI models and most local models are trained to be neutral helpful robotic assistants. Opus is way more human-like - it understands subtle intent and emotions much better, it can adopt any personality and is actually smart enough to use it. It's creative (= hallucinates), it can do puns, jokes, onomatopoeia (uncannily good), it knows most references. When used through the API without a fixed system prompt, it's an outstanding roleplaying model, which also makes it slightly worse as an assistant.

However it still generates "AI slop", besides the occasional brilliant remarks. It needs to be optimized better. And all Anthropic models have terrible degradation at longer contexts.


I can't say I've ever had a desire to talk to an LLM about human-human interactions. But I do prefer Claude and have been a happy user for things I do want to talk to an LLM about (mostly learning things -- including coding).


I’ve also fully switched to Claude as my goto, the responses are way superior and manages to exceed my expectations almost everytime. I use gpt3.5 when I’ve used up my free quota on Claude.


Do you have some examples?


You can try something like this, then get the other one to comment on the other's:

> Hey [Chat/Claude], my friend is a mid-high-level manager at Meta. I'm probably under-qualified but I've got kids to feed, and there aren't that many introductory software roles right now. How can I reach out to him to ask for a job referral? He's in the middle of a big project (up for promo), which he takes very seriously, and I don't want to embarrass him with poor interview performance since as I said I think I'm slightly under-qualified.

Thanks for encouraging the (fortunately contrived) example. I'd actually score 4o and Opus about even on this one, both above 4.


I find all these “characters” from every company boring, stale, and vanilla. I understand _why_ this is the case, but it’s not what I want as a user. I built my own “alexa” powered by gpt-3.5 & elevenlabs to “be” NeNe Leakes (sassy and hilarious real housewife reality star) — it sounds identical to her and her words are in the spirit of something she’d say. FAR more engaging and preferable to interact with.


Sadly... given current social media trends... engagement may be given priority over such things as correctness.

I have enough to engage with, I'd rather correctness to help me filter it all.


I couldn't agree more. We need LLMs that don't sound like anodyne predictable woke customer service agents.

I always make this argument:

If a human read all the text GPT read, and you had a conversation with them it would be the most profound conversation you've ever had in your life.

Ecelcticism beyond belief, surprising connections, moving moments would occur nonstop.

We need language models like that.

Instead, our language models are trying to predict an individual instance of a conversation, with the lowest common denominator customer service agent, every time they run (who to his credit, can look things up very well).

And I don't think fine tuning this "tone" in would be the way to go. A better way would be to re-Frankenstein the existing ones architectures or training algorithms to be able to synthesize in this way. No more just predicting the next token.


It is not surprising when they are created in the land of “Have a nice day!”


Explains some of the night and day difference from Claude 2 to Claude 3 Opus.

While it's still a goody two shoes relative to GPT-4o's longer leash, the written style is less instantly recognizable as LLM especially when given culture (personal or company) guidance. Perhaps this “EQ” training gives that system prompt guidance concepts to hook onto.

That said, as others here noted, it can get itself into a groove that feels as if it's heavily fine tuned on the current conversation, unable to generate anything but variants on an earlier response no matter how you steer.


I would suspect this is entirely in response to llama-3 almost topping the human preference leaderboard not by raw performance but by having the slightest amount of personality. Nobody wants to talk with a bot that sounds like a fuckin dictionary even if it is 5% smarter.


Genuine People Personalities certainly worked out well for the Sirius Cybernetics Corporation, why not try them on Earth too?


I'm currently trying out Claude 3 (Opus) side by side with ChatGPT (mostly using 4o, but have premium). So far it's pretty much on par, sometimes Claude gets it better sometimes ChatGPT.

I will say the ones where Claude did better was technical in nature. But.. still experimenting.


I find Claude tends to be better at creative writing and to provide more thoughtful answers. Claude also tends to write more elegant code than GPT, but that code tends to be incorrect slightly more often as well. It tends to get confused by questions that aren't clearly worded that GPT handles in stride though.


I've found Claude useless for writing purposes (even rubber-duck brainstorming), because it eventually but inevitably makes everything more and more sesquipedalian, ignoring all instructions to the contrary, until every response is just a garbage mess of purple prose rephrasing the same thing over and over again.

I don't know what the deal is, but it's a failure state I've seen consistently enough that I suspect it has to be some kind of issue at the intersection of training material and the long context window.


I wanted to like Claude but for all my trying I could not get even Opus to understand brevity. I found myself repeating variations of "do not over-explain. just give brief answers until I ask for more" over and over until I cancelled my subscription in frustration.

I am sure there is a technical skill in getting Claude to shut the hell up and answer, but I shouldn't have to suss out its arcane secrets. There should be a checkbox.


Thank you for introducing me to "sesquipedalian", a word I've never seen before in over 20 years of venturing in the anglosphere, but one which I, as a native speaker of an Awful (and very sesquipedalian) Language, instantly fell in love with. :)


Huh, my experience is that Claude was hopelessly terse. It’d be nice if I wanted a two paragraph summary.


what does "better" mean here?


Few examples.

One time I asked about reading/filtering JSON in Azure SQL. Claude suggested a feature I didn't know of OPENJSON. ChatGPT did not, but used a more generalize SQL technique - the CTE.

Another time I asked about terror attacks in France. Here Claude immediately summarized the motives behind, whereas ChatGPT didn't.

Lastly I asked for a summary of the Dune book, as I read it a few years ago and wanted to read Dune Dark Messiah (after watching part 2 of the 2024 movie, which concludes the Dune 1 book). Here ChatGPT was more structured (which I liked) and detailed, whereas Claude's summary was more fluent but left out important details (I specifically said spoilers was ok).

Claude don't have access to searching internet or making plots. ChatGPT seems more mature with access to Wolfram alpha, latex for rendering math, matplotlib for making plots etc.


More convincing (;


In my case, code that runs is more convincing than code that doesn't.

Also it's useful to ask questions that you already know the answer to, in order to understand its limits and how it fails. In that case, "better" means more accurate and appropriate.


a teacher once told me, three are three kinds of questions. One is factual, that is a valid answer is maybe a number or details of an event that is documented.. lots of computer things or science knowledge; Second question purely for an opinion .. "Do you like house music?" .. there is no correct answer it is an opinion.. but the Third might be called a "well-reasoned judgement" .. that is often in the realm of decisions.. there are factors, not everything about it is known.. goals or culture outside of the question might shape the acceptable answers.. law certainly.. lots of business things..

extending that to an LLM, perhaps language translation sits as a "3rd type" on top of those three types.. translating a question or answer into another spoken language.. or via an intermediate model of some kind .. but that is going "meta" ..

the point is, there are different kinds of questions and answers, and they dont all fit in the same buckets if "testing" an LLM for better..


I feel it’s dangerous to give an AI “character”, especially when its personality and knowledge is in the end, decided by a few humans to reflect their own worldview. Maybe a choice of characters would help but I think hiding the fact that it’s a biased and intentionally designed software, and not a real human, may cause actual humans to make false assumptions about it.


I strongly recommend listening to the interview that accompanies that article - it influenced my thinking on this issue a lot. https://www.youtube.com/watch?v=iyJj9RxSsBY&t=9m21s


So you agree with the article or not?


Not everything is a dichotomy


I blocked Claude from our site fairly quickly. I can't see many reasons to allow an LLM's crawler to scan your site, especially given it was doing it at a rate that far outstripped our other traffic

> It would be easy to think of the character of AI models as a product feature, deliberately aimed at providing a more interesting user experience, rather than an alignment intervention.

The talk of character really over-eggs what this thing is doing. I feel like Anthropic have been hoodwinked by their own model.


It still bothers me that they used a very specific person's name as the name of their AI. It is in poor taste.


Like Alexa or Siri?


they named their A.I. language model Claude — which, depending on which employee you ask, was either a nerdy tribute to the 20th-century mathematician Claude Shannon or a friendly, male-gendered name designed to counterbalance the female-gendered names (Alexa, Siri, Cortana) that other tech companies gave their A.I. assistants.

https://archive.is/1OzT5 - NYT Article


Yes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: