Hacker News new | past | comments | ask | show | jobs | submit login

So if I’m understanding, stack allowed moderators to issue 30-day bans without following normal escalation policies, if a moderator had any reason to suspect that a user posted AI-generated content.

The new policy is that you have to follow all the normal policies for all the posts, you don’t get to pull out the banhammer just for a suspicion of AI. And the moderators are striking because their want to keep the power to issue uncontestable 30-day bans whenever they feel like it?




No, the new policy means that in almost all cases you cannot moderate content you consider AI-authored at all. This means moderators cannot delete those posts nor suspend the users for this specific reason.

The result is pretty much that AI-generated content is essentially allowed as it cannot be effectively moderated. Even though many sites still have an official policy that disallows it.

Disclaimer: I'm a mod on a small SE site, though I have not acted as a mod on any AI-generated content.


The fact something is or isn't AI doesn't require independent moderation procedures. Either the content has an answer to the question or it doesn't.


If something doesn't answer the question, then it should be removed by moderators. But it shouldn't stop there: junk generated by AI is going to be mass-produced, and anyone posting incorrect AI-generated answers is probably going to be a repeat offender. Taking proactive action against such users who are operating so far outside the purpose of the site is very likely necessary to have any hope of maintaining a reasonable signal to noise ratio.


probably, or is?

The offence here is spamming, AI is orthogonal. Users spamming should be given a timeout.


Consider as an analogy, safety while driving. Driving dangerously is fairly uniformly proscribed, but we also have specific prohibitions on various mechanisms by which people drive dangerously. It's much easier to prove that someone was intoxicated or using a mobile phone, rather than needing to show in each case that the intoxication was dangerous or that the mobile phone use was distracting and the distraction was dangerous.

Similarly, use of LLMs to generate and post large quantities of text from a short prompt is inherently spamming. The user could provide exactly the same value by posting their prompt. If the prompt isn't valuable, the output won't be either, as anyone with a copy of the prompt could generate equivalent output for themselves if they wished to.


As an addendum, to not (I hope) distract from the point about spam -- I don't at all object to using an LLM for inspiration, or editing, or summarisation. So long as there's no claim that the output contains more information than the input. Any statement of fact in the output should be present in the prompt or validated by a human before publishing, or it's suspect. And if it's published without disclosing the lack of validation, it's unethical.


> And if it's published without disclosing the lack of validation, it's unethical.

How is this different than people posting things that they didn't test themselves?

   try

   {code block}

   Hope this helps
Answers that are clearly not an answer are edited to be pretty rather than down voted and flagged because they're wrong ( https://stackoverflow.com/a/76402243 ).

The problem isn't an LLM (though that just provides more scale) - its that incorrect information isn't removable / actionable on SO. If the person tried to answer a question it remains up.


Who says it's not different?


> The offence here is spamming, AI is orthogonal.

You can make broad classifications for an act (it wasn't murder, it was negligent discharge of a firearm), but that doesn't preclude the fact that it is also something else (shooting in the direction of a person without regard to their life and killing them is murder).


Your analogy isn't a great fit, because the distinction between the two carries different penalties.

In this scenario, spamming with or without AI would carry the same penalty. The benefit is that you don't need to determine if AI was involved, since it doesn't matter.


Incorrect answers are not what moderators are moderating. A super easy way to make answers that look right even if they are wrong, for people trying to game SO's point system, is a perfect way for there to be just a huge amount of extra work for mods without ~any benefit.

There are definitely alternatives to the unilateral ban (Thinking about how, like, chess.com does bans based on people cheating), but saying "AI content is qualitatively no different" ignores the bigger ecosystem problem from having the average answer quality simply go down


The problem is that moderation does not necessarily check the solution. A moderator can't acquire the specific version of hardware the questioner had, recreate the failure conditions, then test the solution. Instead, a moderator looks to see if an answer seems reasonable and fits the correct syntax and site-specific rules.

LLMs are great at producing bullshit that looks convincing.

Essentially, one has to trust responders to provide answers to some extent. Untrustworthy responders can use generators to bullshit their way into acquiring trust.


Taking aside the fact that the AI generated answers are awfully wrong most of the time, this line of thinking also leads to an absolutely dreadful state of internet where everything is AI generated crap. This is made worse by the fact that StackOverflow has points, badges and everything. People buy and sell stars on github to increase their chances of being hired somewhere. People absolutely will game the shit out of StackOverflow, not caring for a single second if it actually makes it a better place if it means they get shiny badges and get to put "top commenter on the C# subject on SO (588100 points)" on their resume.


Giving moderators the power to arbitrarily ban users they deem to have produced content by AI is not likely to work for long though. Perhaps today it's still feasible to note the "vibe" for GPT, but that's not likely to be possible for long (it's probably an artefact of the RLHF process anyway). Then what? You'll end up with essentially arbitrary application of the ban hammer based on the moderators' "feelings".

I certainly don't have the answers (and let's face it it's likely going to be a problem here too).


> is not likely to work for long though.

This is true, and it's why the policy promulgated by the moderators was explicitly temporary. It was a response to an acute problem to give everyone time to figure out how to handle it as a chronic condition: https://meta.stackoverflow.com/questions/421831/temporary-po...


>You'll end up with essentially arbitrary application of the ban hammer based on the moderators' "feelings".

So, like the current organisation of things, that works quite well? You put entirely too much value on moderators being perfect. Moderators have been biased forever, have banned for no reason forever. It's a website, you can live with that (or without it). If they're unreasonable, move away, find another place or create another place. If you can't move away, deal with it, create an alt, don't get caught doing the same thing that got you banned, and that'll be it.


I am curious to hear from people here who read resumes and hire people. Is "top commenter on the C# subject on SO (588100 points)" on a resume something that would influence your hiring decision and if so would you be tempted to go check a few of those comments/answers out?


It doesn't ever have to ACTUALLY work for enough people to believe, or want it to work, and go through the effort of spamming a community to death to make a number go up.


I think you're getting to the heart of the behind-the-scenes business decision: Let AI responses proliferate on the platform -- heck, maybe even allow a bot to submit the questions to various LLM's and post the response as a normal user -- then make the volunteer community figure out whether the response was good or not, without letting the mods interfere in the process.


...the mods are part of the volunteer community, and their role largely already is more about keeping the bad apples out than establishing whether a particular response is good or not. Where do you get the idea from that the mods are incapable of this while "the volunteer community" is (or why do you think that's what SE thinks)?


I was implying that I think SE is doing this because it’s the only thing I can think of that would explain what appears to be an insane change of policy. How else would you explain it? What else would be the goal except to test AI responses in the wild as a sort of meta experiment? I’m just trying to connect the dots here.


If that's your base criteria, why not remove humans from the equation entirely and just turn Stack Overflow into purely AI driven? It doesn't matter if the answer is correct, only the speed and formatting of the answer.


That’s a more realistic option than you’d think.

What is the point of a software Q&A site anyway, why not just read the docs?

In my experience, Q&A is useful because it summarises and consolidates disparate information into a concise response to a prompt (sorry, I mean to a Question!) which is... exactly what a chat AI does.

Isn’t chat AI an existential threat to SO, even if AI is banned there?

Certainly I find chat AI better in many cases, and if I don’t like the answers, just going to the docs is the next step, not a generic Google search with SO in the SERP.


> In my experience, Q&A is useful because it summarises and consolidates disparate information into a concise response to a prompt (sorry, I mean to a Question!) which is... exactly what a chat AI does

There is knowledge you will never find in documentation. 50% of places where SO has been helpful to me are not "consolidating documentation". They are solutions to obscure bugs, Useful APIs which are not documented with any example, and low-level logic / high-level design solutions.


Because AI doesn't answer the question, there's nothing "volitional" in it, it's just a (computationally smart) rehash of past answers, hoping that it will all fit together somehow.


> there's nothing "volitional" in it

> hoping

Hmmmm


I know, no matter how much I try (and I do try) the anthropomorphization creeps through from the most unexpected places. I personally hate it, as I hate all this related AI-discourse.


How do you know if content is AI generated?

If mods could decree a piece of content as AI generated (and delete it) willy nilly, then that would be far worse IMO.


As a common example, you see someone who asks a question in the format:

“helo plz can help w cod, is broke has error”

And then the next day posts four answers in perfect English to four different topics, with that GPT “vibe”.

You can’t reliably detect generated content in a vacuum, but Stack Overflow is a very metadata-rich environment for moderators.


As a user of SO, etc. I don't care how they wrote the answer. Is it a good answer? Does it help me? That's what I'm interested in.

Why censor good answers?


They’re usually not good answers, and they’re subject to the bullshit asymmetry principle.


This is a vital comment that people need to understand.

It is _effort_ to moderate content, to read answers and see if they are legitimate. It is almost zero effort to chatgpt out "answers".

The people providing the ChatGPT answers _do not care_ about them, they are only looking to pad their Rep.

It is an attack against the very core of the StackExchange system.


Why do people want to pad their rep? I have nearly 14K "rep" on SO, never had any positive effect on my career whatsoever.

Also, theoretically (never happened), if someone mentioned their exceptional SE profile on their resume or their cover letter, I'd for sure ask them some details from the topics they suggest to be experts on.

Will they also bring ChatGPT to the interview? That'd be fun to watch.


If something has a score, people will compete in it just for the satisfaction of moving up on the leaderboard.


Stack Exchange accounts are sold and bought, so apparently there is an appeal.


> Why do people want to pad their rep? I have nearly 14K "rep" on SO, never had any positive effect on my career whatsoever.

If you are an applicant to a position where there are 5000 resumes submitted and the people doing the filter want a quick and easy numerical ranking of them - provide your Stack Overflow account.

At that point, they can look up your rep and pick the best ones based on that.

This doesn't happen as much in US based companies as there are other metrics that they can use to filter candidates rather than SO rep.

However, if you are in India and applying to a consultancy and every resume and transcript are very similar to the point of not being able to distinguish between them - SO rep provides a very easy way to rank and filter applicants.

Unfortunately, I can't verify this. I've only heard it second hand but it does make sense and helps explain why I occasionally get SO account and stats on resumes from contractors.


> Will they also bring ChatGPT to the interview? That'd be fun to watch.

It is already done, as I've heard. The previous iteration was to pay your lookalike to pass the interview for you. Or just pay anyone and blame lack of camera on "technical difficulties".


> Why do people want to pad their rep?

Because a bunch of dumbass software companies (especially the sweatshop ones) decided that your annual review now needs to include a bunch of dumbass "open source software social work" or you get dinged on your review.

So, Github contributions, Stack Overflow moderation, etc. all became subject to Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

ChatGPT is just accelerating this contamination of the well to all-out flooding of it with industrial sewage.


Why turn SO into a cache of LLM responses, why not just ask an LLM if you want answers and don't care if those answers work?


Bingo.

Stack Exchange could just call an LLM API when a question is asked and show the response. That wouldn’t be near as valuable as their verified index of answers.


However, having knowledgeable people upvote/downvote/correct the AI answer could prove quite valuable.


Can you verify answers faster than GPT can generate them?


I'm not the ideal person to ask here. But in general the output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable. And you have to keep in mind that you usually have more than a single post available in these cases. The pattern of posting content is also a signal by itself.

So if e.g. a user posts a dozen long answers within 10 minutes, and they all have characteristics of ChatGPT, that would be a pretty good signal.


> So if e.g. a user posts a dozen long answers within 10 minutes, and they all have characteristics of ChatGPT, that would be a pretty good signal.

Ah yeah, the frequency of posts definitely makes sense.

> output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable

I would agree with you for gpt 3.5, but I don't think this is the case for GPT-4 (I've spent several hundred hours using GPT-4 for various tasks - mainly related to coding & learning random subjects).


In this case it certainly matters that ChatGPT is free. There are just many more people using that than paying for GPT-4 access.


Still, if the answer happens to be factually correct is it an issue?

Say, a person has an answer but English is not their native language and they manage to stir ChatGPT into writing a good answer. Would we prefer to have that posted instead of keeping a question hanging without an answer at all?

The only issue I can see with AI use is the rate of new content generation. Recent models are quite OK at giving a decent answer. SO is not a pinnacle of exceptionally well thought out answers from people either. There are great detailed and well sourced answers but more often then not you get an incomplete, outdated or even just plain wrong answers. Bespoke artisanal hand-crafted ethically sourced answers from fully organic free range humans that still lead to stack overflows and misalignments elements on webpages.


"But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.

In practice, as reported in comments at https://meta.superuser.com/q/15021/38062 and in many other Meta Q&As, the answers that people are lazily machine-generating at high volume are far from correct; and the consequent upvotes that they garner reveal the unsurprising fact that there are a lot of people who vote in favour of things based upon writing style alone.


> "But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.

Why? It becomes irrelevant if an account is spamming, sure delete everything regardless, but if I have a generated and correct answer in my otherwise pristine account?

What I'm trying to say is, the fact that some people use it to spam shouldn't make it a simple ban condition. Otherwise that'd be banning emails to fight spam.


"if I have a generated and correct answer in my otherwise pristine account?" is just a re-phrasing of the irrelevant hypothetical side-issue.

You haven't. People aren't. This is a hypothetical that isn't the reality, and an irrelevant distraction.

Go and read the comments where I just hyperlinked, then read the months of back-discussion on this in the other Meta Q&As that I mentioned, starting with the likes of https://meta.superuser.com/q/14847/38062 right there on the same Meta site, and a lot more besides on many of the 180 other Stack Exchange sites, continuing with the likes of https://math.meta.stackexchange.com/q/35651/13638.


> You haven't. People aren't. This is a hypothetical that isn't the reality, and an irrelevant distraction.

How can you be so sure? If one day comes a 100% reliable way to detect all AI-generated responses, how can you be sure that also the good ones won't get deleted in one major sweep?

Yes, I see there are many people who despise the AI generated spam on many sites. But nothing you posted proves that all (I'd even say, "significant portion of") AI generated content is spam.

I don't see anything wrong letting the AI generate an answer and edit the rough/wrong parts if necessary.


> I don't see anything wrong letting the AI generate an answer and edit the rough/wrong parts if necessary.

And this is not what the previous moderation policy was trying to prevent. What it was trying to prevent is answers from people skip that second step.


I was opposing to this part:

> "But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.


I'm sceptical of the premise. ChatGPT doesn't watermark its answers. There's no decent way to detect what is "OpenAI garbage" and what is not. One of the comments says: "I detect such answers by the fact that they simply make no sense, although they seem well-written." I feel like this is a subject of survivorship bias. Would the commenter know a good ChatGPT answer from a human-produced answer?

A separate question is why there's still a lot of crap questions/answers on SO if quality is the goal? There's a plenty of low-effort and incorrect answers made by real people that are not penalised in any way.


You can ask all of these questions while also empowering moderators to use the tools at their discretion. To refuse to allow moderation while not providing any solutions is the worst of all options.


>But in general the output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable

Only if you don't prompt it to do otherwise.


Yes, but no one is claiming to have a reliable way to detect AI generated answers. The moderators are asking for the authority to suspend accounts in cases where they DO have evidence it is AI generated.


Somehow people are missing the obvious: Answers that say "Source: ChatGPT" or similar. I have seen plenty of those, and of course they are hot garbage and get downvoted


but you can still moderate it for all the normal quality issues, it has to be on-topic and truthful and properly sourced. it's only if it's a good answer, but you think it might have been ai generated that you can't moderate it?

i'm really not seeing the problem here.


That's because you've entirely missed the part where the diamond moderators report that in private they were given far stricter rules than were let on in public, which disallowed this sort of weaselling.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: