This is my attempt to articulate why some recent shifts in AI discourse seem to be degrading the product experience of everyday conversation.
I argue that “sycophancy” has become an overloaded and not very helpful term; almost a fashionable label applied to a wide range of unrelated complaints (tone, feedback depth, conversational flow).
Curious whether this resonates with how you feel or if you disagree
The issue raised here seems mostly semantic, in the sense that the concern is about the mismatch between the standard meaning of a word (sycophant) and its meaning as applied to an issue with LLMs.
It seems to me that the issue it refers to (unwarranted or obsequious praise) is a real problem with modern chatbots. The harms range from minor (annoyance, or running down the wrong path because I didn’t have a good idea to start with) to dangerous (reinforcing paranoia and psychotic thoughts). Do you agree that these are problems, and there a more useful term or categorization for these issues?
Re: minor outcomes. It really depends on the example I guess. But if the user types "What if Starbucks focuses on lemonade" and then gets disappointed that the AI didn't yell at them for being off track--what are they expecting exactly? The attempt to satisfy them has led to GPT-5.2-Thinking style nitpicking[1] They have to think of the stress test angles themselves ('can we look up how much they are selling as far as non-warm beverages...')
[1] eg. when I said Ian Malcolm in Jurassic Park is a self-insert, it clarified to me "Malcolm is less a “self-insert” in the fanfic sense (author imagining himself in the story) and more Crichton’s designated mouthpiece". Completely irrelevant to my point but answering as if a bunch of reviewers are gonna quibble with its output
With regards to mental health issues, of course nobody on Earth (not even the patients with these issues, in their moments of grounded reflection) would say that that the AI should agree with their take. But I also think we need to be careful about what's called "ecological validity". Unfortunately I suspect there may be a lot of LARPing in prompts testing for delusions akin to Hollywood pattern matching, aesthetic talk etc.
I think if someone says that people are coming after them the model should not help them build a grand scenario, we can all agree with that. Sycophancy is not exactly the concern there is it? It's more like knowing that this may be a false theory. So it ties into reasoning and contextual fluency (which anti-'sycophancy' tuning may reduce!) and mental health guardrails
<< The harms range from minor (annoyance, or running down the wrong path because I didn’t have a good idea to start with) to dangerous (reinforcing paranoia and psychotic thoughts). Do you agree that these are problems, and there a more useful term or categorization for these issues?
I think that the issue is a little more nuanced. The problems you mentioned are problems of sort, but the 'solution' in place kneecaps one of the ways llms ( as offered by various companies ) were useful. You mention the problem is reinforcement of the bad tendencies, but no indication of reinforcement of good ones. In short, I posit that the harms should not outweigh the benefits of augmentation.
Because this is the way it actually does appear to work:
1. dumb people get dumber
2. smart people get smarter
3. psychopaths get more psychopathy
I think there is a way forward here that does not have to include neutering seemingly useful tech.
AI sycophancy is a real issue and having an AI affirm the user in all/most cases has already led to a murder-suicide[0]. If we want AI chatbots to be "reasonable" conversation participants or even something you can bounce ideas off of, they need to not tell you everything you suggest is a good idea and affirm your every insecurity or neurosis.
Or did you place about 2-5 paragraphs per heading, with little connection between the ideas?
For example:
> Perhaps what some users are trying to express with concerns about ‘sycophancy’ is that when they paste information, they'd like to see the AI examine various implications rather than provide an affirming summary.
Did you, you personally, find any evidence of this? Or evidence to the opposite? Or is this just a wild guess?
Wait; nevermind that we're already moving on! No need to do anything supportive or similar to bolster.
> If so, anti-‘sycophancy’ tuning is ironically a counterproductive response and may result in more terse or less fluent responses. Exploring a topic is an inherently dialogic endeavor.
Is it? Evidence? Counter evidence? Or is this simply feelpinion so no one can tell you your feelings are wrong? Or wait; that's "vibes" now!
I put it to you that you are stringing together (to an outside observer using AI) a series of words in a consecutive order that feels roughly good but lacks any kind of fundamental/logical basis.
I put it to you that if your premise is that AI leads to a robust discussion with a back and forth; the one you had that resulted in "product" was severely lacking in any real challenge to your prompts, suggestions, input or viewpoints.
I invite you to show me one shred of dialogue where the AI called you out for lacking substance, credibility, authority, research, due dilligence or similar. I strongly suspect you can't.
Given that; do you perhaps consider that might be the problem when people label AI responses as sycophancy?
Well I do have a chat log somewhere where I say potential energy seems like a fake concept and GPT and/or Gemini got around to explaining that it can actually be expressed in equations reliably.. does that count?
"called you out for lacking substance, credibility, authority, research, due dilligence or similar" seems like level of emotional angst that LLMs don't usually tend to show
Actually amusingly enough the Gemini/Verhoeven example in my doc is an example where the AIs seem to have a memorably strong opinion
I argue that “sycophancy” has become an overloaded and not very helpful term; almost a fashionable label applied to a wide range of unrelated complaints (tone, feedback depth, conversational flow).
Curious whether this resonates with how you feel or if you disagree
Also see the broader Vibesbench project: https://github.com/firasd/vibesbench/
Vibesbench discord: https://discord.gg/5K4EqWpp