Hacker News new | past | comments | ask | show | jobs | submit login
Moderation strike (meta.stackexchange.com)
412 points by mwint on June 5, 2023 | hide | past | favorite | 273 comments



The Open Letter linked in this post is probably a better explanation for most people:

https://openletter.mousetail.nl/

The meta post linked here is targeted more towards an internal audience of active users.

There are two big parts to this issue, one is that the company is overriding the decisions of the communities and essentially preventing them from moderating AI-authored content entirely. The second one is the way this was done, with no feedback at all, extremely quickly, with vast differences between the public policy and what they told the moderators.


The manner in which they rolled it out is a very real concern, but moderating based on how the moderators think you wrote the content is a dangerous policy that is ripe for abuse, and SE has a history of overly aggressive moderators. I don't blame SE for wanting to nip it in the bud.

If a post isn't constructive, they can and should moderate it away, whether or not it was AI generated. If they want to have a rate limit on number of answers per day, they can do that. But they need to moderate based on observable metrics, not guesses at what's happening behind the scenes.


I'm not involved, but here is an example I saw:

A user has been writing (bad) answers in broken English for months, and now suddenly writes answers in perfect GPT-esque style while the technical aspect is still bad.

The moderation burden for the second kind is much higher, while other users are much more likely to mistake it for a good answer. If the policy is "no AI-generated content allowed", should moderators be allowed to suspend the user?


All that that example shows is that native-English bullshitters have long had an advantage over ESL bullshitters. We don't need to take away tools that good ESL contributors could use to get more recognition, we need to find an actual solution to the bullshitting that has always plagued SO.


I think this is partially true, but insist there is more to it than that.

Careful, clear writing is hard even for native speakers, and good writing serves as a proof-of-work: answers that have had more work put into them are a signal that more work has been done, i.e higher quality answers.

We're quickly losing that signal.


Also, trying to hold back the tide on AI is dumb and doomed to fail. People need to find some way to exist along with these tools. The editors at Stack Exchange are not the kind of powerful cabal that is going to roll back the last three years of computer advances.


An excellent use-case of AI with Stack Overflow would be to integrate it and use existing questions and answers to help people solve problems. A terrible use would be to feed the answers back into the system, because it reduces both the value humans and eventually AI bring.

As a person that looks at stackoverflow answers, it's also awful to see the long-winded AI answers. If there's mistakes in the answers, there's no way to get the submitter to make changes because they likely didn't understand it in the first place.

There's ways to do it correctly, but inexperienced developers opening chatgpt and blindly copying answers to harvest karma isn't it.


Regardless of whether it's doomed to fail, the content these "computer advances" are creating on the site is definitely garbage at the present time. I've flagged hundreds of posts, mostly brand-new users attempting to rep farm. A huge percentage of the answers are flat-out incorrect or have little to do with the question.

Banning all moderation without consulting the community is not a step towards coexistence, it's a step towards turning the site into a dumping ground for stale LLM spew in the name of engagement.

Companies like Stack Overflow seem to care less about the quality and accuracy of the site's content as long as the engagement numbers look good. Unfortunately, a lot of users don't seem to care about quality, either, which is how we've wound up with the current state of affairs where just about every social platform is increasingly flooded with bot spam, misinformation, scams and junk.

By banning low-quality posts, including unverified LLM answers, the site can secure a future for itself as a bastion of quality. Or it can turn its back on the experts that built the content that's used to train the LLMs and hope that LLM quality improves enough that expert humans aren't necessary.

Even if that gamble works out for them and LLMs do mostly replace humans as some commenters optimistically seem to expect, then there'll still be no need for Stack Overflow, as one can simply ask an LLM directly. That's why effectively dumping human experts for LLMs seems like a failed business model either way. The best approach to move forward seems to be to carve out a space that provides unique value that LLMs can't and ride that until LLMs make significant advances.

Disclosure/context: I'm a daily SO answerer on strike, among the top ~4k users by rep overall. I'm pretty sure most folks who are dismissing the problem don't monitor tag feeds or curate queues enough to see the flood of blatantly wrong LLM answers spamming in from new accounts.


We deeply believe in the core mission of the Stack Exchange network: to provide a repository of high-quality information in the form of questions and answers, and the recent actions taken by Stack Overflow, Inc. are directly harmful to that goal

Unfortunately, this seems a naive take; the core mission of the network is to serve the commercial purposes of the business.


That's a very snarky way to put it. It's very well within the rights of the SE community to express their perspective and demand this to be respected. That's not naive. Naive would be assuming this is guaranteed to work. But if they actually implement a moderation strike then SE is going to fall apart sooner than later. So, it's not like they have no leverage.


Its not at all "snarky".

Expecting a business - with infamous episodes of contempt towards moderators - to behave in the idealistic way presented here is naïve.

Might it submit to pressure? Perhaps. But the wording of this letter in its presumption of the motives of a private organisation misunderstands the reality of the commercial world.


It's not a statement of expectation, it's a statement of value. Your uncharitable reading does not change at all what the statement is trying to convey: there's a mismatch between what the moderators perceive as the business' value and how the business acts to protect and expand on that value.


Very true. Throwing up your hands once you realize that corporations tend to be amoral profit maximizers is not helpful. Expecting corps (even non-profits) to behave altruistically is naive, so we have to take real steps to ensure their survival depends on them being good actors.

Specifically, organization of contributors actually sounds like a very effective way of holding these morally dubious “content” middle men accountable. The consumers are too heterogeneous, numerous, and uninvested to realistically coordinate.


When the business is built on the activity of those moderators then there's a strong relationship between them, whether the business likes it or not.


They're not just expecting the business to behave this way they are attempting to force its hand.

Are you familiar with strike tactics?


Of course, but strikers are paid workers operating within a legal process, they can't just be easily supplanted by free alternatives as can moderators.

But that's getting off the topic of whether or not the wording of the quoted paragraph reflects reality.


The idea of a strike being 'legal' or not is an unfortunate yoke that labor movements have allowed to be fitted to them. It's a silly idea and should be addressed as such. Labor has power. The collective masses have power over the few who oppress them. It doesn't matter if those people decide to call certain labor actions legal or illegal, they should happen (and must happen) all the same to keep the balance of power equitable.


> they can't just be easily supplanted by free alternatives as can moderators.

That's an incredibly naive take. Do you think there is an abundance of moderators just waiting to fill the ranks? Several SE sites are struggling with a lack of moderators already.

Similarly, there is no abundance of community members curating questions and answers. It's all volunteers, and it's not like there's a huge untapped pool that they can access on-demand.

It's kinda like saying that the entire community can be easily replaced because it's free. No, the community is the business value!

Lastly, the main anti-spam tool of the site (smokedetector-se.org - community-built and -hosted) is also offline as part of the strike. The tooling built by SE the site is in no way shape or form adequate for combating the amount of spam the site receives. Sure, it's not irreplaceable, but it's not "free".


> But that's getting off the topic of whether or not the wording of the quoted paragraph reflects reality.

not off-topic at all - but I think you just noticed that your argument doesn't hold.


My original point was about an incorrect reading of what the network is there for; the distraction into strikes and whatnot might be of interest but not on that specific point - which holds regardless of your rather silly supposition.


go ahead and file a complaint with the ministry of silly suppositions


Strikes are not always in the legal process. Normally they are, but a number over time have been illegal by the laws.


>strikers are paid workers

Not in this case. The exchange is real but non-monetary.

>But that's getting off the topic

No, not really.


Are we heading towards a Soviet Union 1.2? It seems like there’s writing on the wall that isn’t clear enough for my eyes. It seems that a temporary mass destruction of free market economy is anticipated.


Not to get too tinfoil hat about it, but an economy controlled by a number of equity firms one can count on a single hand, with fingers to spare, hardly seems a "free market".

Hence recent developments of large brands committing billion-dollar acts of seppuku in near-realtime: somebody is pulling strings in non-obvious, non-"free market" ways.

It's a zany time to be alive, and one hopes that sites like this one can help redistribute a "free market" sensibility that seems on the wane.


> Not to get too tinfoil hat about it, but an economy controlled by a number of equity firms one can count on a single hand, with fingers to spare, hardly seems a "free market".

Is this the index fund conspiracy theory again? Where companies like Vanguard supposedly form some sort of shadowy evil cabal that secretly controls the entire economy?


It is by no means conspiratorial to suppose that large amounts of power concentrated in few hands tends toward tyranny; it's simply history.


Sure but that’s not how index funds work.


Oh, well, that explains everything.


> large brands committing billion-dollar acts of seppuku in near-realtime

Could you cite examples? I don’t really keep up to date with the news so I would like to know?


Bud Light


Okay, what else? One company doesn’t really indicate to me a trend


Target.


I don’t fundamentally disagree, just I’m feeling uncomfortable watching the world slowly converging into “a great war for a great reset” type of thinking.


"You may not be interested in the dialectic, but the dialectic is interested in you." https://en.wikiquote.org/wiki/Leon_Trotsky


it is every business mission and goal to deliver value to its owners.


Sure, which is why it is not helpful or interesting to point it out.

When companies talk about their "mission", they are referring more to how they intend to deliver value to their owners, usually by identifying some social need and satisfying that need in exchange for value.


nobody questions that - not even the open letter.


The core mission of the system that is stack overflow is in the eye of the beholder. If you ask a army of unpaid volunteers who maintain it or one of the even larger army of contributors who actually provide expert level content for free, I think you'll be a different answer than the C-suite will give you.

The business is a helpful abstraction layered over the top of all these people doing the actual work. It's useful as long as it keeps the lights on. When it stops doing that the community has the right and the responsibility to move the work product of the huge quantity of people actually doing the thing here somewhere else.


Well then maybe "the business" shouldn't depend on the gratis effort of hundreds of thousands of unpaid volunteers. If I read between the lines, it would seem to me that management has decided they can make more money by allowing AI-generated content. I don't think this is true. Yet. But whether it is or isn't, this is a revocation of the implied contract we, the users, have had with the site.


To the extent that this is true, it's a corruption of the purpose of companies in our society.

The reason we created "corporations" as a concept was not simply to have a vehicle to make as much money as possible, any way possible. It was to serve society by providing services and making products, which would then be sold, and, if they were good enough, would make the company a profit.

This idea that no business should ever be expected to do anything but what makes them the most money the fastest is toxic and is ruining our economy and our society.


How about a principled and optimistic take? And/Or, the basis under which moderators have been offering their services.

We may have been duped by a company's lies, of course. Seems like they've committed a mix of copyright infringement and fraud in that case.


Like I said, "unfortunately".


Business viability should be a means to realize the core mission, not the other way around.


How does allowing AI content inherently and unquestioningly serve the commercial purposes of the business?


Well this is off the topic of the quoted passage in my comment, but your use of "unquestioningly" - is a strange choice of word, and reflected elsewhere in the letter in "unchecked". I found that quite a dubious wording as well.

That's exactly what moderation is for, checking, questioning, etc, its not an argument against AI-generated content.

There seems to be an emotional reaction, but people will still value the highest quality content, whatever its provenance.

If the argument is that the workload will be too high, well make that argument - don't sidetrack with misleading and idealistic appeals to emotion.


I think this is a naive (or cynical?) take on a business, its also their goal to hire and retain both employees and customers.

If the latter is done based on a lie, then you cant align your “commercial purpose” and people who fulfill it..


Unfortunately (again), there are plenty of people who will buy from and work for businesses that don't subscribe to the same views as those in the letter.


The core mission for the c-suite executives and owners is commercial. But they neither build nor use the thing. They're just in charge.

Workers taking back some of that control is a great thing.


Sadly it doesn’t have a link for it’s principal assertion:

> Stack Overflow, Inc. has decreed a near-total prohibition on moderating AI-generated content

That would have been a really good place for [one of these](https://somewhere)


The problem is that the public policy as posted by SE is simply different than the private communication from SE. Those private rules were shared in places that have an expectation of confidentiality, so the moderators are not entirely free to just post those without violating that expectation.

The public policy by SE is misleading, it makes the rule appear a lot different than it actually is. I am a mod on a small SE site, so I have seen the internal communication and it does essentially prohibit moderating AI-generated posts except in some very narrow circumstances.


Surprisingly brief and well-written letter. I hope they overcome.


So if I’m understanding, stack allowed moderators to issue 30-day bans without following normal escalation policies, if a moderator had any reason to suspect that a user posted AI-generated content.

The new policy is that you have to follow all the normal policies for all the posts, you don’t get to pull out the banhammer just for a suspicion of AI. And the moderators are striking because their want to keep the power to issue uncontestable 30-day bans whenever they feel like it?


No, the new policy means that in almost all cases you cannot moderate content you consider AI-authored at all. This means moderators cannot delete those posts nor suspend the users for this specific reason.

The result is pretty much that AI-generated content is essentially allowed as it cannot be effectively moderated. Even though many sites still have an official policy that disallows it.

Disclaimer: I'm a mod on a small SE site, though I have not acted as a mod on any AI-generated content.


The fact something is or isn't AI doesn't require independent moderation procedures. Either the content has an answer to the question or it doesn't.


If something doesn't answer the question, then it should be removed by moderators. But it shouldn't stop there: junk generated by AI is going to be mass-produced, and anyone posting incorrect AI-generated answers is probably going to be a repeat offender. Taking proactive action against such users who are operating so far outside the purpose of the site is very likely necessary to have any hope of maintaining a reasonable signal to noise ratio.


probably, or is?

The offence here is spamming, AI is orthogonal. Users spamming should be given a timeout.


Consider as an analogy, safety while driving. Driving dangerously is fairly uniformly proscribed, but we also have specific prohibitions on various mechanisms by which people drive dangerously. It's much easier to prove that someone was intoxicated or using a mobile phone, rather than needing to show in each case that the intoxication was dangerous or that the mobile phone use was distracting and the distraction was dangerous.

Similarly, use of LLMs to generate and post large quantities of text from a short prompt is inherently spamming. The user could provide exactly the same value by posting their prompt. If the prompt isn't valuable, the output won't be either, as anyone with a copy of the prompt could generate equivalent output for themselves if they wished to.


As an addendum, to not (I hope) distract from the point about spam -- I don't at all object to using an LLM for inspiration, or editing, or summarisation. So long as there's no claim that the output contains more information than the input. Any statement of fact in the output should be present in the prompt or validated by a human before publishing, or it's suspect. And if it's published without disclosing the lack of validation, it's unethical.


> And if it's published without disclosing the lack of validation, it's unethical.

How is this different than people posting things that they didn't test themselves?

   try

   {code block}

   Hope this helps
Answers that are clearly not an answer are edited to be pretty rather than down voted and flagged because they're wrong ( https://stackoverflow.com/a/76402243 ).

The problem isn't an LLM (though that just provides more scale) - its that incorrect information isn't removable / actionable on SO. If the person tried to answer a question it remains up.


Who says it's not different?


> The offence here is spamming, AI is orthogonal.

You can make broad classifications for an act (it wasn't murder, it was negligent discharge of a firearm), but that doesn't preclude the fact that it is also something else (shooting in the direction of a person without regard to their life and killing them is murder).


Your analogy isn't a great fit, because the distinction between the two carries different penalties.

In this scenario, spamming with or without AI would carry the same penalty. The benefit is that you don't need to determine if AI was involved, since it doesn't matter.


Incorrect answers are not what moderators are moderating. A super easy way to make answers that look right even if they are wrong, for people trying to game SO's point system, is a perfect way for there to be just a huge amount of extra work for mods without ~any benefit.

There are definitely alternatives to the unilateral ban (Thinking about how, like, chess.com does bans based on people cheating), but saying "AI content is qualitatively no different" ignores the bigger ecosystem problem from having the average answer quality simply go down


The problem is that moderation does not necessarily check the solution. A moderator can't acquire the specific version of hardware the questioner had, recreate the failure conditions, then test the solution. Instead, a moderator looks to see if an answer seems reasonable and fits the correct syntax and site-specific rules.

LLMs are great at producing bullshit that looks convincing.

Essentially, one has to trust responders to provide answers to some extent. Untrustworthy responders can use generators to bullshit their way into acquiring trust.


Taking aside the fact that the AI generated answers are awfully wrong most of the time, this line of thinking also leads to an absolutely dreadful state of internet where everything is AI generated crap. This is made worse by the fact that StackOverflow has points, badges and everything. People buy and sell stars on github to increase their chances of being hired somewhere. People absolutely will game the shit out of StackOverflow, not caring for a single second if it actually makes it a better place if it means they get shiny badges and get to put "top commenter on the C# subject on SO (588100 points)" on their resume.


Giving moderators the power to arbitrarily ban users they deem to have produced content by AI is not likely to work for long though. Perhaps today it's still feasible to note the "vibe" for GPT, but that's not likely to be possible for long (it's probably an artefact of the RLHF process anyway). Then what? You'll end up with essentially arbitrary application of the ban hammer based on the moderators' "feelings".

I certainly don't have the answers (and let's face it it's likely going to be a problem here too).


> is not likely to work for long though.

This is true, and it's why the policy promulgated by the moderators was explicitly temporary. It was a response to an acute problem to give everyone time to figure out how to handle it as a chronic condition: https://meta.stackoverflow.com/questions/421831/temporary-po...


>You'll end up with essentially arbitrary application of the ban hammer based on the moderators' "feelings".

So, like the current organisation of things, that works quite well? You put entirely too much value on moderators being perfect. Moderators have been biased forever, have banned for no reason forever. It's a website, you can live with that (or without it). If they're unreasonable, move away, find another place or create another place. If you can't move away, deal with it, create an alt, don't get caught doing the same thing that got you banned, and that'll be it.


I am curious to hear from people here who read resumes and hire people. Is "top commenter on the C# subject on SO (588100 points)" on a resume something that would influence your hiring decision and if so would you be tempted to go check a few of those comments/answers out?


It doesn't ever have to ACTUALLY work for enough people to believe, or want it to work, and go through the effort of spamming a community to death to make a number go up.


I think you're getting to the heart of the behind-the-scenes business decision: Let AI responses proliferate on the platform -- heck, maybe even allow a bot to submit the questions to various LLM's and post the response as a normal user -- then make the volunteer community figure out whether the response was good or not, without letting the mods interfere in the process.


...the mods are part of the volunteer community, and their role largely already is more about keeping the bad apples out than establishing whether a particular response is good or not. Where do you get the idea from that the mods are incapable of this while "the volunteer community" is (or why do you think that's what SE thinks)?


I was implying that I think SE is doing this because it’s the only thing I can think of that would explain what appears to be an insane change of policy. How else would you explain it? What else would be the goal except to test AI responses in the wild as a sort of meta experiment? I’m just trying to connect the dots here.


If that's your base criteria, why not remove humans from the equation entirely and just turn Stack Overflow into purely AI driven? It doesn't matter if the answer is correct, only the speed and formatting of the answer.


That’s a more realistic option than you’d think.

What is the point of a software Q&A site anyway, why not just read the docs?

In my experience, Q&A is useful because it summarises and consolidates disparate information into a concise response to a prompt (sorry, I mean to a Question!) which is... exactly what a chat AI does.

Isn’t chat AI an existential threat to SO, even if AI is banned there?

Certainly I find chat AI better in many cases, and if I don’t like the answers, just going to the docs is the next step, not a generic Google search with SO in the SERP.


> In my experience, Q&A is useful because it summarises and consolidates disparate information into a concise response to a prompt (sorry, I mean to a Question!) which is... exactly what a chat AI does

There is knowledge you will never find in documentation. 50% of places where SO has been helpful to me are not "consolidating documentation". They are solutions to obscure bugs, Useful APIs which are not documented with any example, and low-level logic / high-level design solutions.


Because AI doesn't answer the question, there's nothing "volitional" in it, it's just a (computationally smart) rehash of past answers, hoping that it will all fit together somehow.


> there's nothing "volitional" in it

> hoping

Hmmmm


I know, no matter how much I try (and I do try) the anthropomorphization creeps through from the most unexpected places. I personally hate it, as I hate all this related AI-discourse.


How do you know if content is AI generated?

If mods could decree a piece of content as AI generated (and delete it) willy nilly, then that would be far worse IMO.


As a common example, you see someone who asks a question in the format:

“helo plz can help w cod, is broke has error”

And then the next day posts four answers in perfect English to four different topics, with that GPT “vibe”.

You can’t reliably detect generated content in a vacuum, but Stack Overflow is a very metadata-rich environment for moderators.


As a user of SO, etc. I don't care how they wrote the answer. Is it a good answer? Does it help me? That's what I'm interested in.

Why censor good answers?


They’re usually not good answers, and they’re subject to the bullshit asymmetry principle.


This is a vital comment that people need to understand.

It is _effort_ to moderate content, to read answers and see if they are legitimate. It is almost zero effort to chatgpt out "answers".

The people providing the ChatGPT answers _do not care_ about them, they are only looking to pad their Rep.

It is an attack against the very core of the StackExchange system.


Why do people want to pad their rep? I have nearly 14K "rep" on SO, never had any positive effect on my career whatsoever.

Also, theoretically (never happened), if someone mentioned their exceptional SE profile on their resume or their cover letter, I'd for sure ask them some details from the topics they suggest to be experts on.

Will they also bring ChatGPT to the interview? That'd be fun to watch.


If something has a score, people will compete in it just for the satisfaction of moving up on the leaderboard.


Stack Exchange accounts are sold and bought, so apparently there is an appeal.


> Why do people want to pad their rep? I have nearly 14K "rep" on SO, never had any positive effect on my career whatsoever.

If you are an applicant to a position where there are 5000 resumes submitted and the people doing the filter want a quick and easy numerical ranking of them - provide your Stack Overflow account.

At that point, they can look up your rep and pick the best ones based on that.

This doesn't happen as much in US based companies as there are other metrics that they can use to filter candidates rather than SO rep.

However, if you are in India and applying to a consultancy and every resume and transcript are very similar to the point of not being able to distinguish between them - SO rep provides a very easy way to rank and filter applicants.

Unfortunately, I can't verify this. I've only heard it second hand but it does make sense and helps explain why I occasionally get SO account and stats on resumes from contractors.


> Will they also bring ChatGPT to the interview? That'd be fun to watch.

It is already done, as I've heard. The previous iteration was to pay your lookalike to pass the interview for you. Or just pay anyone and blame lack of camera on "technical difficulties".


> Why do people want to pad their rep?

Because a bunch of dumbass software companies (especially the sweatshop ones) decided that your annual review now needs to include a bunch of dumbass "open source software social work" or you get dinged on your review.

So, Github contributions, Stack Overflow moderation, etc. all became subject to Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

ChatGPT is just accelerating this contamination of the well to all-out flooding of it with industrial sewage.


Why turn SO into a cache of LLM responses, why not just ask an LLM if you want answers and don't care if those answers work?


Bingo.

Stack Exchange could just call an LLM API when a question is asked and show the response. That wouldn’t be near as valuable as their verified index of answers.


However, having knowledgeable people upvote/downvote/correct the AI answer could prove quite valuable.


Can you verify answers faster than GPT can generate them?


I'm not the ideal person to ask here. But in general the output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable. And you have to keep in mind that you usually have more than a single post available in these cases. The pattern of posting content is also a signal by itself.

So if e.g. a user posts a dozen long answers within 10 minutes, and they all have characteristics of ChatGPT, that would be a pretty good signal.


> So if e.g. a user posts a dozen long answers within 10 minutes, and they all have characteristics of ChatGPT, that would be a pretty good signal.

Ah yeah, the frequency of posts definitely makes sense.

> output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable

I would agree with you for gpt 3.5, but I don't think this is the case for GPT-4 (I've spent several hundred hours using GPT-4 for various tasks - mainly related to coding & learning random subjects).


In this case it certainly matters that ChatGPT is free. There are just many more people using that than paying for GPT-4 access.


Still, if the answer happens to be factually correct is it an issue?

Say, a person has an answer but English is not their native language and they manage to stir ChatGPT into writing a good answer. Would we prefer to have that posted instead of keeping a question hanging without an answer at all?

The only issue I can see with AI use is the rate of new content generation. Recent models are quite OK at giving a decent answer. SO is not a pinnacle of exceptionally well thought out answers from people either. There are great detailed and well sourced answers but more often then not you get an incomplete, outdated or even just plain wrong answers. Bespoke artisanal hand-crafted ethically sourced answers from fully organic free range humans that still lead to stack overflows and misalignments elements on webpages.


"But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.

In practice, as reported in comments at https://meta.superuser.com/q/15021/38062 and in many other Meta Q&As, the answers that people are lazily machine-generating at high volume are far from correct; and the consequent upvotes that they garner reveal the unsurprising fact that there are a lot of people who vote in favour of things based upon writing style alone.


> "But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.

Why? It becomes irrelevant if an account is spamming, sure delete everything regardless, but if I have a generated and correct answer in my otherwise pristine account?

What I'm trying to say is, the fact that some people use it to spam shouldn't make it a simple ban condition. Otherwise that'd be banning emails to fight spam.


"if I have a generated and correct answer in my otherwise pristine account?" is just a re-phrasing of the irrelevant hypothetical side-issue.

You haven't. People aren't. This is a hypothetical that isn't the reality, and an irrelevant distraction.

Go and read the comments where I just hyperlinked, then read the months of back-discussion on this in the other Meta Q&As that I mentioned, starting with the likes of https://meta.superuser.com/q/14847/38062 right there on the same Meta site, and a lot more besides on many of the 180 other Stack Exchange sites, continuing with the likes of https://math.meta.stackexchange.com/q/35651/13638.


> You haven't. People aren't. This is a hypothetical that isn't the reality, and an irrelevant distraction.

How can you be so sure? If one day comes a 100% reliable way to detect all AI-generated responses, how can you be sure that also the good ones won't get deleted in one major sweep?

Yes, I see there are many people who despise the AI generated spam on many sites. But nothing you posted proves that all (I'd even say, "significant portion of") AI generated content is spam.

I don't see anything wrong letting the AI generate an answer and edit the rough/wrong parts if necessary.


> I don't see anything wrong letting the AI generate an answer and edit the rough/wrong parts if necessary.

And this is not what the previous moderation policy was trying to prevent. What it was trying to prevent is answers from people skip that second step.


I was opposing to this part:

> "But what if they were correct answers?" is largely an irrelevant hypothetical side-issue.


I'm sceptical of the premise. ChatGPT doesn't watermark its answers. There's no decent way to detect what is "OpenAI garbage" and what is not. One of the comments says: "I detect such answers by the fact that they simply make no sense, although they seem well-written." I feel like this is a subject of survivorship bias. Would the commenter know a good ChatGPT answer from a human-produced answer?

A separate question is why there's still a lot of crap questions/answers on SO if quality is the goal? There's a plenty of low-effort and incorrect answers made by real people that are not penalised in any way.


You can ask all of these questions while also empowering moderators to use the tools at their discretion. To refuse to allow moderation while not providing any solutions is the worst of all options.


>But in general the output of specific AI tools like ChatGPT produces certain patterns that are quite noticeable

Only if you don't prompt it to do otherwise.


Yes, but no one is claiming to have a reliable way to detect AI generated answers. The moderators are asking for the authority to suspend accounts in cases where they DO have evidence it is AI generated.


Somehow people are missing the obvious: Answers that say "Source: ChatGPT" or similar. I have seen plenty of those, and of course they are hot garbage and get downvoted


but you can still moderate it for all the normal quality issues, it has to be on-topic and truthful and properly sourced. it's only if it's a good answer, but you think it might have been ai generated that you can't moderate it?

i'm really not seeing the problem here.


That's because you've entirely missed the part where the diamond moderators report that in private they were given far stricter rules than were let on in public, which disallowed this sort of weaselling.


This seems to happen over and over: something that is working fine is changed. Not to make it better, but in hopes of business growth. Half the time it backfires, the other half there are some mostly middling improvements to the business. The odds aren't great, but there are people charged with "growing the business" and so they must change something, even if there's nothing that pops out as a great idea.

The community of moderators is kind of a symbiote attached to this enterprise. It gleans and curates and makes the end product more helpful to users. "Helpfulness" is a second-order effect of this moderation, and the whole attraction to the business.

After telling moderators to not moderate, moderators should get the message: It's not about AI, its about whether moderation is valuable, and whether helpfulness of answers is valued by the business.


This is just the most recent decline in stackoverflow 8-(

Ever since the posting system was turned into a social score where people are mostly conncerned with increassing their score versus answering questions, stackoverflow has failed it's users.

Just another example in the very long list of for-profit plaforms doing what's best for profit over what's best for the users...


Say you have an expert in a field who's not stellar at english, someone like me, and I use a large language model to generate a response on a subject I'm an expert in? That's not harmful. I'm not blindly pasting the output to their answer form, my point is still to post a correct answer. I'd obviously double-check. It would be able to produce a better-worded response than my semi-bilingual brain ever could so there is added value there, without even accounting for the time it saves me. I edited this post like 4 times to add an S somewhere or remove one already. Blanket bans on LLMs like the moderators want to do are lazy. The criteria that should matter is whether you answer the question properly or not

Then the second point is, on a blind sample of question answers, how well can they tell if something has been generated? I bet it's not stellar.

I hope Stack Exchange stays on their position


But that is the point, if you use AI to improve what you would answer, that is totally fine, but not that you just copy and paste what it says.

And, I guess they will just realise it is AI generated when someone notices that what is written does not make any sense, but it is still really well written. In any case, very difficult to identify, for sure.


Most people can read ESL English better than GPT's "wordy advertisement" style English.


If everyone used LLMs like that, there would be no problem. The thing is, few do, and with the new policy, moderators are not in a position to do anything about it.


Why not? If an answer is wrong it's wrong. Why does a GPT ban prevent removing wrong answers?

What a GPT ban does do is give moderators an extremely subjective tool for removing answers. The GPT detectors are unreliable, so that leaves the moderators doing a gut check on whether this particular case is a false positive.

Instead, why don't they just use the GPT detectors to detect answers that might be wrong and then moderate based on existing policies?


In short: moderators do not remove answers based on (their perception of) technical merit, non-moderators cannot remove answers as long as the score is non-negative, and GPT drivel sounds convincing enough to get upvotes from credulous newbies, so the answers have positive score.


So fix the first one. Provide a process for removing answers that are demonstrably false. Convincing-sounding drivel that gets upvotes isn't a new problem on SO, it's just amplified by the new tech.


But this is a problem of scale. If I wrote a bot that posted 10k AI generated questions and answers per hour, you would surely agree that it would be unreasonable to require every answer to be human-reviewed before it can be removed - it would be an incredible waste of time and effort. (Also note that filling SO with LLM-generated content is entirely pointless - someone who wants LLM-generated answers can just go ask an LLM).

Now spread that one bot over hundreds of users and you have the same end result. That's why the communities of different SE sites all ended up with a similar policy.


I get that, but the problem of scale is one that you have no matter what. Either you have to detect and filter out AI posts at scale, or you have to detect and filter out bad content at scale.

I'm not suggesting either problem is easy, but taking the "ban AI" approach is harmful for several reasons. Detecting AI without false positives is extremely hard and puts too much decision-making power in the gut reactions of moderators. Additionally, banning it has the negative side effect of making the tool unavailable to people who might legitimately benefit from it when producing good content, such as ESL speakers using it to polish their English.

We need to moderate based on the effect of the user's content on the community, not the technology used to produce the content. If you're dealing with a bot posting 10k questions and answers per hour, it wouldn't matter if that bot were using GPT or just re-posting content scraped from the web—the abuse isn't in the use of AI, it's in the spam. So make a rule against spam and auto-ban people who do it.


I said this elsewhere in the thread too, but the ban was always meant to be temporary: https://meta.stackoverflow.com/questions/421831/temporary-po... The moderators recognize it's inevitable and wanted to find a way to helpfully integrate AI. The strike is not about making the ban permanent; it's about being suddenly overruled in secret, cutting off the possibility for dialogue to find a good solution.


The thing that I also don't get, is that I haven't seen an example of that being an issue on stackexchange yet

I also don't see the end goal making a bot that answers questions on SE. It doesn't make money. Maybe to get points? But once you're past their thresholds there's no reason to keep doing it, and you get there quick. Accounts don't get sold to advertisers like on reddit. You'd at most do it once. And who would even do that? The very narrow niche of people who'd want to boost their SE points? Maybe you're shooting for the leaderboard but if that's the case... you'd get noticed, wouldn't you?

I could see some kids doing it, if that's the big threat they're facing... then they're over-reacting and it still doesn't warrant blanket bans on LLMs


> I also don't see the end goal making a bot that answers questions on SE.

A tip I learned long ago: Never ask a geek "why?", just nod your head and back away slowly.


Exactly, how would anyone know I answered with the help of an AI? I got bored reading the post drone on about banning GPT. Weird. None of it resonated with me.


Having flagged hundreds of such answers before SE suppressed LLM moderation, I can vouch that they stick out like a sore thumb:

- they're all written in the same style ("It looks like you're trying to do convert a string to an integer...") - they have perfect grammar - they're often/usually wrong - they tend to be tone-deaf to the question or are answering a different question than asked - their formatting is often poor by virtue of copy-paste from ChatGPT and the author's lack of familiarity with markdown - they're usually posted by new or new-ish users that have few reputation points (ostensibly, their goal is to farm reputation rather than help curate a quality resource) - they're posted in large quantities (a dozen or more in an hour) that'd be virtually impossible for a human to churn out (especially a brand new user!) - they're posted in random tags that show no clear connection from one to the next--most answerers are subject matter enthusiasts or experts and stick to a narrow range of tags - they tend to contain hallucinations obvious to SMEs, like invoking methods that don't exist - the code is plain wrong when executed - the code style is "boring"/"vanilla" and tends to steer clear of idiomatic language features that a seasoned programmer would employ and has few formatting quirks that a human might use - the code is often heavily commented in a predictable and artificial manner - the explanation after the code and overall layout/flow of the post is often the same - they tend to be dramatically different than the user's normal answers, which have typos.

As you'd expect, it's difficult to _prove_ that a particular answer was generated by an LLM (the fact that LLMs can't reliably detect themselves is part of the problem--they're inaccurate!). However, the possibility of occasional false positives (SE has provided no actual evidence of this being an issue), seems a necessary price to pay, and could be solved in a more balanced manner than prohibiting all LLM moderation. SO would be unusable if it became a stale cache of mostly-incorrect LLM spew, which is what SE's new moderator guidelines seem to be OK with.

If I want an LLM answer, I'll ask an LLM. I can then prompt engineer and iterate. If I want a human answer, I want to be able to ask on SO so I can be guaranteed I'm "speaking with a human".

Disclosure/context: I'm a power-user on strike.


The article talks about GPT detectors. They don't work amazing, though.


Management and business continue to treat volunteer moderators like their employees or contractors; which even in the some of the better examples in the corporate world, can be mildly abusive as people need to eat and have families to feed.

You can't treat volunteers like that. You have to keep them happy. You have to treat them with respect. You're not paying them for their free labour.


I hate the unpaid moderator model, it generally attracts only the most terminally-online busybodies, and they seem completely detached from the core aims of SO, instead only interested in metadrama.

I don't like it on Reddit and I don't like it on SO. They should pay professional moderators just like Twitter and Facebook.


I think you've hit on something here.

I understand very well that it is active moderation which keeps SO from being a toilet of spam and junk. At the same time, in my limited experience SO moderators are petty tyrants, very jealously guarding their tiny domains.


> wI understand very well that it is active moderation which keeps SO from being a toilet of spam and junk.

I often wonder about this. I can see how this is true for somewhere like Reddit where lazy memes could overrun any subreddit, but I'm less convinced when it comes to SO. The vast majority of SO moderator actions are on questions that non-mods have already downvoted and reported. These questions could/would be hidden by SO's algorithm anyway. Only new queue lurkers see unpopular questions.

I think that SO's moderation could be almost entirely automated and overseen by a small number of professional paid moderators.


> the core aims of SO

Which are?


A place that gives me answers to common coding questions.


If the place is filled to the brim with GPT-generated nonsense, how do you plan on finding those answers?


It's not "management and business". People will always tend to take as much as they can before someone fights back.


Only a certain type of person does that. And management and business leaders more often than average are that type.


> People will always tend to take as much as they can before someone fights back.

I don't think that's always the case. People can and do organise themselves in ways that prioritise collaboration over treating these sorts of things as a zero-sum game. Not all orgs have to function in this way.


This is far from universally true.

If this is all you have experienced, then I urge you to look beyond your current circle (or bubble) to find people who genuinely care about others and want to improve the world.


Almost nobody has probably had that as their only experience. However, it only takes relatively few people to do that for thing to turn sour -- sometimes a single person is enough. And in a large enough circle, chances are someone is like that.

Since most people tend to avoid confrontation, that gives the kinds of people who want to take as much as they can (or to have their way in some other way) a free pass until someone does in fact fight back.

Hence it's not surprising if, in the big picture, it seems like "people" (as in enough of them to make it a potential issue) do in fact seem to behave that way.


> They don’t really understand what they’ve just copied and presented as an answer to a question.

> Content posted without innate domain understanding, but written in a “smart” way, is dangerous to the integrity of the Stack Exchange network’s goal: To be a repository of high-quality question and answer content.

Tricking our bullshit detectors (eg cues present in the text or implied context) is the biggest problem I have with usefulness of AI generated content.

I wrote the Medieval Content Farm (https://tidings.potato.horse/about) as an excuse to talk about this (and cope), although people focus more on the fact that I present it as a joke.


I haven't seen an example of that being an issue on stackexchange yet

I don't see the end goal making a bot that answers questions on SE. It doesn't make money. Maybe to get points, but once you're past every point threshold there's no reason to keep doing it, and it happens fairly quick. Accounts don't get sold to advertisers like on reddit. You'd at most do it once, and for the very narrow niche of people who'd want to boost their SE points? Maybe you're shooting for the leaderboard but if that's the case... you'd get noticed, wouldn't you?

I could see some kids doing it, if that's the big threat they're facing then it doesn't warrant a blanket ban on LLMs. I care about 1) spam, 2) answer usefulness. Not how the answer was written


It makes sense from a business perspective (they have been losing traffic to chatGPT), but unfortunately it may also mean the end of Stack Overflow as we know it. The whole value of SO was to be able to connect with subject experts in a reasonably easy and quick way.

Computer generated answers compiled from existing resources may work for simple questions, but not for things that require specific knowledge and experience.


I'd doubt SO is the only go-to platform nowadays for modern matters. In 2009-2018 it was the go-to place for all modern technologies.

But since then 1) moderation got more severe, so you can't ask a question like "what packages are there for X?" (even though there are many remaining from the early 2010s) 2) many questions similar to older ones but different in fine details, get closed quickly.

I see new tech go to their own forums based on Discourse.

So, since 2020, I'd still come to SO for answers already available, but for a place to make new questions -- I'd look elsewhere.


I have not found a reasonably good answer to a question in at least 5 years. Seems more like 10.

I still get a better hit rate from mailing lists and github issues.


that's not the only value. There's a lot more. Emphasizing on "no-noise" is a big one. Stack Overflow was created to solve the problem of having to dig through long forum threads to get answers, or getting blocked by paywalls. There's also value in searchability and showing up in search engines. Running LLMs like ChatGPT is not cheap in a lot of ways.


It's not a black and white issue. Most domain experts already use AI assistants for their daily work. There is absolutely nothing wrong with that. It can and has demonstrated to greatly enhance productivity.

The SO problem isn't AI, it's people submitting low value answers, regardless of the way they used to produce those.

It is if anything a failiure of their reputation system, both in incentives and in repercussions.


They do? For what, do most domain experts use AI assistants for?


I used it extensively to debug a very thorny MySQL version upgrade. In my experience, it knows a lot about MySQL, and can reason about weird behavior very very effectively. A typical use is asking it something like, "Prior to my MySQL upgrade, my DB integration tests all passed, but now, they're non-deterministically failing with the following error. What could be causing this?" It then proposes hypotheses, which I either accept or push back on with new information. Surprisingly, this process led to it actually debugging a great number of my problems.


I use them every day for generating basic framework of whatever task I am doing, mostly coding. "Write an HTTP request that does this for this xyz test url that takes x y or z as input, create a table with the outputs, sample json attached." etc.

It saves me hours working on mundane shit every week.


90% of academics I know use chatGPT to help write grant applications or articles. Not for anything related to their actual domain, but more for improving language and clarity.


Proofreading, commenting, exploration, investigation, inspiration ...

Just because you are an expert does not mean you tackle every mundane but needed part of the work bare fisted or you memorized every edge case.


Programming is the one I see that comes up the most often. Unsurprisingly the experts are the best positioned to use it since they can identify hallucinations.


I frequently use it for mundane things. For instance, I had a set of go structs I wanted converted to rust. I told it to do the conversion using serde, and it spat out perfectly semantic rust. Doing this manually would have taken a hour, given the volume. This quite frequently works excellent for me... Complex, one off parsing problems


Writing code. Github Copilot is a big time saver.


I agree with the statement that AI-generated content is garbage. But having seen moderation on SO turn into gatekeeping in the last ~7 years, I start suspecting this is a case of fight for being gatekeeper. I don't trust the mods that they did conduct anti-AI policy the right way -- I suspect they could overdo it very easily. With no sane ways for users to appeal. Current state of moderation leaves me no desire to contritube to SO -- neither via questions/answers, nor via using moderation tools (I can make edits and review answers or edits of others).

Similar situation with moderation took hold in Wikipedia back in the 2000s, when it became only up to them whether a paragraph is "neutral" or "an opinion" and must be deleted (e.g. some pages have pieces saying "it's a common misconception that ___", but in some pages they got deleted with edit comment "it's a POV, must not be in Wikipedia").


I'm confused. What exactly was going so wrong with the temporary AI ban that they felt a near-180-degree turn would be better? I did see the mention of "the rate of inaccuracy experienced by automated detectors aiming to identify AI- and specifically GPT-generated content", but that hardly seems so catastrophic as to suggest the opposite would be better.


Because they were losing traffic to chatGPT. Stack Overflow has been all about numbers and revenue for some time now.


But isn't that the advantage of having ChatGPT in the ecosystem? Who wants to moderate a 1000th variation of the same trivial question? Let ChatGPT handle the easy cases and leave the hard stuff for SO. Perhaps SO management does not know their place and they try to swallow the world?


Oh shoot, thanks for explaining that! When you put it that way, it definitely looks like a threat to the business!


I see the chain of reasoning, but it also seems quite logical that quality answers written by real humans should be a pretty big competitive advantage. Throwing away your biggest sales point just because the competition has something new is a business suicide.


Indeed it is business suicide. I posted this before on another discussion, but this seems to be a common trend [1]:

> Here is how platforms die: first, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves. Then, they die.

[1] https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys


You're almost definitely right, but I'm not sure how unbanning AI will stop that trend though?


I see where the strikers are coming from, but isn't this an intractable problem? There's no way to tell if content is AI generated.


There's definitely a recognizable default style to a lot of ChatGPT output. Sure, with some prompts you can get away from that, but it's ultimately still following a somewhat predictable pattern.

However, as a counterpoint, I now see people who spend a lot of time using ChatGPT actually end up writing in real-time, in-person, like ChatGPT's default vanilla output. Just like some American kids now say "mummy" because of watching so much Peppa Pig.


You can tell it in some cases, especially those that contain typical LLM hallucinations. Many LLM generated answers are either plain wrong, or have made up function or argument names.


That’s definitely not the case with GPT4


> There's no way to tell if content is AI generated.

In general no. But I don't think anyone human writes in the typical style of a ChatGPT answer, so in practice there is a large class of cases where you can tell.


There would be if it were a strike, because the so-called "strikers" would operate the company as employees.

This is a boycott, not a strike.


No, boycott means refusing to use a service. This is about refusing to moderate, so it's a strike. Moderators are volunteer workers.


You're correct that distinguishing between AI-generated content and human-generated content can be a challenging problem. With the advancements in AI, particularly in natural language generation, it has become increasingly difficult to detect AI-generated text. This poses several challenges in various domains, including online platforms, journalism, and even legal contexts.

The ability to determine the origin of content is indeed a complex issue. However, it's important to note that researchers and developers are actively working on developing methods to identify AI-generated content. There are ongoing efforts in the field of AI ethics and responsible AI development to address this problem.

Some approaches to tackling this issue include developing AI algorithms that can generate AI-generated content, also known as "adversarial AI." These algorithms aim to create AI models that can detect and identify AI-generated text by analyzing patterns, linguistic cues, or statistical properties specific to AI-generated content.

Additionally, researchers are exploring the use of metadata, watermarking, or digital signatures to provide additional information about the origin or authenticity of content. These methods could potentially help in distinguishing between AI-generated and human-generated content.

While it may be a complex problem to solve entirely, the development of techniques and tools to identify AI-generated content is an ongoing area of research. The aim is to create a balance between the benefits of AI-generated content and the need for transparency and trust in online information.

Of course, all paragraphs above are ChatGPT-generated. I did not even read them before commenting. That's the bar to clear. Otherwise, see https://xkcd.com/810/


I don't think they can win, and I don't mean just the moderators here, SO in general can't win a fight against AI with bans of all kinds.

I see one possibility for them, embrace AI and generate a quick/automated first reply by AI marked as such (with a disclaimer) for every post. It should be subject to the same voting system.

The error of moderators and SO here is to discard AI generated answers because some wrong (but sounding right) answers it can generate... when often AI answers are also correct and even at times out do what a single human would have found/answered.

If you can harness the existing human knowledge and correct the "bad" ones (badly rated AI answer given less weight) to feed the models of the future it seems like a win for everyone in the long run. A ban misses this opportunity and generates even more work for moderators which will inevitably also ban some innocent users and valuable content.


> I don't think they can win, and I don't mean just the moderators here, SO in general can't win a fight against AI with bans of all kinds. > I see one possibility for them, embrace AI and generate a quick/automated first reply by AI marked as such (with a disclaimer) for every post

Problem is, most contributors would probably stop contributing (answering questions) if that were the case. If there's an automatic answer that is correct 2/3 of all times, that would mean lots of time spent reviewing automatic answers and lots of time "wasted" (where a contribution isn't needed), which will probably discourage most of them


This doesn't track, The OP will vote on the automated answer before the question is public. If it solves his problem then this is a major reduction in low quality question spam, if it doesn't then the AI post is already hidden so it doesn't waste mod time.


But is the OP the best placed to know if the AI answer is correct? They're obviously not an expert, and even if something seems to work initially doesn't mean it's the correct way of doing it. Especially in the context of code, which is I think the majority of Stack Exchange's traffic, where you could easily have some code which seems to work, but has some bug/edge case/insecure practice/etc.


> correct 2/3 of all times

You are an optimist


Indeed. For me the most infuriating part is the amount of low-quality questions that could easily be answered by AI before submitting the question. They definitely should embrace AI and generate an answer. If you think that question isn't properly answered, include why the AI proposed solution is wrong and let humans answer it.

As for the correctness of the solution: that's why the voting system and tickmarks are there. Wrong solutions would ideally be downvoted and never marked correct, I don't really see how AI is making a big difference here. Moderators today aren't running the answered code either.


The difference in how much effort it is to create a correct-looking/sounding answer without an LLM, and how little effort it is with e.g. ChatGPT. It's a force-multiplier on the side of people creating bad answers. Worse actually, "bad but convincing sounding answers", which drive-by voters are especially prone to misevaluate.


> The error of moderators and SO here is to discard AI generated answers because some wrong (but sounding right) answers it can generate... when often AI answers are also correct and even at times out do what a single human would have found/answered.

But why should SO exist for this? If people wanted an LLM to answer their questions, they could just go and have that. There is no purpose in caching stale LLM answers on SO.


Honestly good on them. We've seen that the rise in ChatGPT spam has led to spamming of repositories with poor quality PRs [1] and this whole AI craze just feels like the SEO disaster hitting mach 10. The quality of content or answers doesn't actually matter, just that it's formatted in a way that seems authoritative. This stuff needed to be purged from the internet yesterday.

[1] https://mastodon.social/@danluu/110335983520055904


What happens when sites like SO become so polluted with AI generated text that the next generation of LLMs trained on the Internet is just AIs being trained on AIs?


Rapid degradation, like recording multiple generations of VHS tapes. The LLMs make the internet dumber, the LLMs get dumber by learning from it. Rinse and repeat.


Maybe, maybe not. The human brain trains on its own output without descending into chaos. It’s not hard to imagine a scheme where you use a model like GPT-4 to filter a dataset before training. Classification is easier than creation, so you’d expect performance to continue to improve regardless of how poisoned the unfiltered dataset becomes.

One of the more exciting scenarios would be if it turns out that performance can improve indefinitely with a generate -> filter -> train cycle. There are certainly parallels to how humans learn.


> The LLMs make the internet dumber, the LLMs get dumber by learning from it.

I see this said a lot, but in reality it just depends on how the network is trained and how it's prompted. For example, you don't get dumber because you read children's books, you just get better at understanding what makes a good children's book. It's only if reading children's books comes at the cost of reading other content that you might be dumber as a result.

Similarly, an AI doesn't automatically get dumber because it encounters dumb content. It's only if you're training exclusively on dumb content that it doesn't know what quality looks like that you'd have problems.

Broad training sets (ideally pruned of as much junk as possible) and RLHF in theory should condition the network to reproduce quality content and not simply the lowest common denominator of what's found on the internet.

And assuming all that fails there's nothing stopping researchers from just using past datasets with improved architecture going forward. I mean you'd have to wonder why on Earth OpenAI would even release GPT-5 if it's worse than GPT-4...

There's just no scenario here in which what you're saying would actually play out in reality. One way or another companies will ensure the next iteration of their LLMs are better than their previous.


> For example, you don't get dumber because you read children's books, you just get better at understanding what makes a good children's book.

I don't think this is an accurate analogy. It's not books for children -- well-written material at a lower educational/cognitive level -- it's more like books by children -- which is necessarily lacking skill and background context (connection to reality). Think about how children constantly pass around misleading, invented, and incorrect stories among themselves -- and they don't do it maliciously, they just don't know any better. Legends like "Candyman"/"Bloody Mary" for example.

They need an outside influence, an adult or a book or website, to nudge them out of that knowledge rut.

(Of course the same thing can happen with a (closed) group of adults, too, but it's more of a "natural state" with children because they simply haven't had time to encounter as much knowledge.)


The internet often feels like this already, i.e. if I Google a question the top result will be often be a Quora post


Unrelated to this thread but how does Quora manage to be so bad yet so popular? It's a horrible interface and the answers never seem to be good. Often I'll click a top Google search result and it'll be a Quora "thread" where I can't even see an answer.


My theory is that it isn't popular, people realise it garbage but they spend a lot on SEO to rank well


Makes me think culture itself was always already like this.


Using `-site:` will probably make you very happy if you don't already use it and want to get rid of low quality Q&A sites choking your SERPs.


I'm willing to bet OpenAI knows how to detect OpenAI output, either via stenographic techniques or via keeping a database of all the text it's generated. Both, probably.

Which means future OpenAI models would be getting trained on the output of competitor models. Like Bard. Oof.


The intelligence comes from RLHF.


Given f(prompt, model) -> text is there some h(model, text) -> [0,1) which tells you if the the text was generated by the model? Crucially, could you publish h without publishing the model?

It’s a bit like public key cryptography — if generating text is using sign() to sign a prompt with your model then is there a publicly available verify() that verifies the output came from the model but which doesn’t leak the private model itself?

What sort of things exist like this (other than private, API models recording all text they’ve generated)?


Interesting mathematics problem aside.

I think OpenAI sells a subscription to a ChatGPT detector. So you can pay for h. Does it work correctly? I don't know. Selling the poison and the antidote seems like a good business move though (I know, you asked "other than private API" - not sure they record the generated text, not sure they don't).

Now, a ChatGPT-generated text (for current versions of ChatGPT) is more or less recognizable so for moderation purpose I would guess you don't really need h, you can smell the bullshit. It has a specific way to be overconfident and it feels like it's giving you a lesson in a specific impersonal way without emotions. Something like this. I have the same kind of feeling when reading a WikiHow page, WikiHow has a very specific and recognizable style to explain things.

I guess you can recognize patterns / specific behaviors on accounts posting ChatGPT texts too, which can help for particularly short texts.


Scott Aaronson has worked on this with OpenAI. Ctrl-F for "watermarking" in https://scottaaronson.blog/?p=6823. I don't know if they've actually deployed it in ChatGPT or not.


Good. I hope it stays this way forever.

I’m honestly sick of being told I can’t say “Thank You” at the end of my post or other dumb crap these mods waste my time with.


While some of the SO practices can feel dumb, I wonder what is the tradeoff we are making here? Could it be that for whatever reason (e.g. personality) we might benefit from accepting these "dumb" choices because it also brings with it unrelated benefits? For example, might it be that if they were forced to accept "frivolous" statements such as "thank you" (from the example you gave), might it cause moderators to not moderate, and thereby allow in spam & other nasty bits?

It is worth bearing in mind that if we, "good people", complain about moderation, we only see the parts of it that touch our "good posting". There might be plenty of good that moderators are also doing, which only the bad guys see.


"I was slighted once by a completely opt in community that helps millions of people every day so it deserves to die"

Why is this everyone's default response to this kind of stuff?


Because it'd be super simple for a mod to say "oh, no problem, you can say thank you." Instead, it's like "NO WAY CAN YOU SAY THANK YOU DO YOU REALLY WANT TO WASTE PEOPLE'S TIME WITH... pleasantries? ::ban hammer noises::"


Between this and the reddit blackout, "volunteer" mods doing free work for a company are really overstating their importance. The cons don't outweigh the goods. It's why I like the Fediverse: I control what I can see on my own instance and don't have a rando telling me what I can and cannot do.


A rando can't deny your freedom to post, but they can deny you interaction with anyone beyond your own instance. Most instances have a pretty long blocklist (albeit sometimes for good reasons).


A single rando can deny your interaction on their instance, not on a bunch of other instances. Shared blocklists notwithstanding, but I would hope if you're following a shared blocklists you're vetting it to an extent.


On one hand I simpatise with the problem, on another Stackoverflow is one of those platforms where the close random hammer for seemingly random reasons was always used.


In total I think I have gotten more useful answers from closed posts than non-closed ones. Someone at SO have greatly underestimated the value of having a handful of almost identical questions each with their own answers.

If anything the network might have lost traffic because moderators have been too efficient in closing duplicate questions before they got useful answers.

That said, AI garbage posts do have to be fought with fire.


Turns out the mods don't like it when told their actions are "too subjective". Irony overflow.


What's the point of posting AI generated babble on stack overflow without checking it? If I wanted that kind of potentially useful answer I could have just asked ChatGPT myself.

But if the user has has the necessary expertise to make sure that what ChatGPT generated is actually correct before posting it, is there really an issue? It would save them some time and allow more questions to be answered.


Farming Internet points.


And? If the answers are getting upvotes then they’re by definition helpful. What’s the problem?


If the answers are getting upvotes then they’re in practice merely looking seemingly reasonable to clueless newbies who copy-pasted them. Or they are "funny". This nonsense post received 66 upvotes and 3 downvotes before it was removed 11 days ago: <https://web.archive.org/web/20220410125443/https://stackover...>. Getting an upvote earns you 10 points, getting a downvote loses you 2 points. You need to have earned 15 points to cast an upvote; you need 125 points to cast a downvote, and each one costs you 1 point. Do the math. Upvotes aren’t much more meaningful there than on Hacker News.


So your theory is that people go around upvoting posts based solely on them looking somewhat correct? I’ve never upvoted anything that didn’t help me solve a problem. I don’t imagine random upvotes are very common.

Also, to your point about humor, in my experience GPTs are very bad at it. If a post is funny it’s most likely not AI generated. Your expectation that all talking meat should maintain a consistently somber decorum while online is, needless to say, unrealistic.


I am not claiming GPT is any good at humor. I am claiming this is the sort of superficial quality that gets you upvotes on Stack Overflow. GPT is pretty good at superficial things.

> Your expectation that all talking meat should maintain a consistently somber decorum while online is, needless to say, unrealistic.

Everywhere on the Internet, probably not. In Stack Overflow answers, it wouldn’t be half bad. It’s what they are for. But I wouldn’t even go that far: for example, another’s answer joke that "That's a very complicated operator, so even ISO/IEC JTC1 (Joint Technical Committee 1) placed its description in two different parts of the C++ Standard." is fine in my book, as that answer is otherwise pretty informative. Unlike the one I linked to before, which is just a confusing mess of random access humor (superfluous Xkcd: <https://xkcd.com/1210/>).


I’d say humor gets upvotes precisely because it’s not superficial. Humor arises from presenting ideas in a simple but still surprising new way. It takes insight into both the subject and the audience. Humor is practically the antithesis of dry AI content.


This is what the now-deleted answer said:

> For larger numbers, C++20 introduces some more advanced looping features. First to catch i we can build an inverse loop-de-loop and deflect it onto the std::ostream. However, the speed of i is implementation-defined, so we can use the new C++20 speed operator <<i<< to speed it up. We must also catch it by building wall, if we don't, i leaves the scope and de referencing it causes undefined behavior. To specify the separator, we can use:

Do you feel informed by this? Do you think a newbie would be? What kind of enlightening insight flows from this? This about as funny as the output of a Markov chain: extremely hilarious… for about 15 minutes, after which it just becomes boring.


It was probably upvoted for the “inverse loop-de-loop”. That’s a great line.


So you admit it was for superficial reasons. Thank you for conceding my point.


I concede nothing. That clever play on words is comfortably outside the output distribution of a modern GPT. When the relationship between tokens can perplex a 175B parameter model, it’s no longer superficial.


I think there’s value in a service that indexes and ranks AI generated responses. If hypothetically stack overflow was 100% AI generated content, it would be acting like a cache for GPT-4. Running GPT-4 is far more expensive than a stack overflow database. Why not pre-generate answers to all common questions so they can be retrieved quickly and cheaply?

Future language models are going to provide better quality answers. At some point the line between GPT and expert will get so blurry, enforcing an AI ban will be a fools errand. Sites like SO will still have value as a cache for the most expensive models.


Kudos and am completely on board with them. Forcing smart people to vet machine-generated content made in a second of poster's time is the ultimate f%^k you to human dignity. And without mods it will be a cesspool in a pinch so hopefully management wakes up ASAP.


Honestly stack overflow is just a for profit Wikipedia. It’s content should be scraped and an open source version replace it


It's so sad to see how a project that once set out to NOT be like the other popular QA sites of that time could still end up so horribly similar. How a group of good people, with best intentions could still end up with a site in a state like this. The way to hell is paved with good intentions, I guess.


> How a group of good people, with best intentions could still end up with a site in a state like this.

Almost any for-profit platform is doomed to become a dumpster, in the end. After Joel Spolsky and Jeff Atwood, the original founders, sold it to Prosus 2 years ago it has been free falling.


The original founders checked out long before the acquisition, and it's only gotten worse since then. Just the way these things go.


Compared to Experts Exchange or Quora, Stack Exchange is still miles better. You never need to pay to see answers, or register, or have tons of annoying popups, etc.


It's data is licensed under creative commons, and last I looked into it, you could easily download the entire stackoverflow dump. There was a torrent and it was about 50GB or something.


https://github.com/answerdev/answer#readme is Apache 2 licensed, the sibling comment pointed out that the existing S.O. data is open licensed, but your premise has the same problem every "I'm going to take my ball and go play in the other yard" does: the network effect is very, very real


Humans manually posting AI responses is dumb.

Stack Overflow should have a built-in AI responder, marked as such, that gives an instant unverified first response, which can then be checked and corrected by human moderators.


This is not what I want Stack Overflow for, and I think it would discourage higher quality responses from humans.

If you want a ChatGPT answer to a question, go ask ChatGPT (directly, or through one of the many more focused frontends people have built for it). But Stack Overflow should encourage answers from real humans.


I do wonder what would happen if they scrapped karma. The right answers will still get upvoted, but there's no longer any incentive for those seeking internet points.


This is interesting from the perspective of how ChatGPT affects social groups.

From the standpoint of business strategy Stack is caught in a bind.

Banning bot responses seems like a great idea. Bot responses water down the quality of content, and they risk becoming a library of bot responses.

Allowing bot responses also has benefits. The author of post mentions the false-positives on their detection algorithms. A false positive that leads to a ban or other mod action will piss off real users and hurt engagement. Bot responses may not be the same quality as human experience but it may also be a good starting point to drive engagement.

Stack's best move may be to be more transparent in HOW they are making decisions and share their metrics for policy success with their users. Radical openness may be the only way for a radically open knowledge sharing platform to survive the bot-war.


Why is anybody moderating on Stack Overflow at all?

What is the incentive?


I'm a top 5% moderator on AskUbuntu and have answers on half a dozen SE sites.

I used to use LinuxQuestions, a forum site, but found that SE provided better answers for me (better format, less searching), so in return I answered questions there if I could. I also used to blog tricky problems I'd solved on my *nix boxen; SE was marginally easier than making a blog post and would get better responses/corrections.

That was the initial motivation: community participation, quid pro quo.

Keeping SE sites like U&L and AU useful is my on-going motivation, in part I enjoy helping others, in part I enjoy using my Linux knowledge acquired over many years as a user.

Yes, the company get value but all answers are "open source" and it serves the public good too.


To have a useful and functional community for a niche topic. The same reason someone would moderate a forum or sub-Reddit.

What's the incentive for dang to moderate HN? If it's garbage and uncurated, people probably won't use it because the filtering tools to parse the data are non-existent.

People come for an answer to their question, perhaps they answer the questions of others. Just like how any "community" works.

"Their treasure was knowledge."


Looking at the questions, they seem not very niche to me:

https://stackoverflow.com/questions/

Just that typical questions about popular technologies.

Isn't Dang being paid to moderate HN?


They're niche relative to "where else are you going to ask them successfully?". If you go on Twitter, whilst it may have way more people, you're probably not going to reach the same demographics in as meaningful a way.

I would assume so, yes, but the point still stands: he needs to moderate it or people won't find value in it, thus he'll be out of a position.

What's the value in Wikipedia? The curated knowledge. You only have to go on the average Talk tab for an article on Wikipedia to realize how hard-won most content is, and no, the best outcome doesn't always happen, and plenty of genuine nonsense still makes its way into articles and stays there for years on end.


There are 180 other Stack Exchange sites, with subjects ranging from Latin through Woodworking to Biblical Hermeneutics. This is a General Strike.


dang is paid to moderate HN.



In addition, pg himself holds an account for emergency intervention.


> They want the internal AI policy given to moderators to be revealed to the community [teekert's comment]

It wouldn't be that hard from someone to leak it? If all moderators across different sites can see it, it's not exactly a state secret.


When the complaint is the lack of trustworthiness and of sticking to avowed principles, being untrustworthy and abandoning one's own principles in response is not a wise course of action.


No, it just goes to show that trust goes both ways. Trust doesn’t mean that one side can dictate the rules and the other meekly submits. Those weren’t the moderators’ principles anyway, they were imposed on them by corporate.


The underhanded way SE has handled this is a problem, but deleting AI-generated answers is silly. They should instead ask SE Inc. to automatically provide one AI answer for all new questions. Nobody wants to answer the same trivial question a 1000th time. Let AI do that. SE should be for questions that AI is unable to answer.

CodeChef gives you an option of AI advice if your submission fails to build. It's very effective at solving beginner problems. SE should do the same.

As a positive side effect, it would make spamming with AI-generated answers ineffective and thus save mods time on answers as well.


Clever idea!


I don't think criticising AI on grounds of it not 'understanding' is a strong argument, since we have neither a definition for, nor a way to measure, what understanding is.


I agree that we can't provide a definition of "understanding" that would permit automated classification of answers into "shows understanding" and "shows no understanding".

But the best SE answers actually convey real understanding to the reader; they go beyond the brief provided by the question, and explain a subject with conciseness and lucidity. Nobody could mistake such an answer for ChatGPT output.

[I have a suspicion that such really good answers may be mostly several years old; I haven't ever thought to try and quantify that]


>as a last-resort effort to protect the Stack Exchange platform and users from a total loss in value.

The strike is not the last last-resort effort. Stack Exchange should not only open source all answers, but also open the platform with ActivityPub. Then, those moderators can create another frontend where they ban the AI users. Otherwise, the moderators will create their own platform without Stack Exchange being the main hub.

It's wild that moderator conflicts happen on Stack Exchange and Reddit at the same time.


Stack Overflow/Exchange answers have been 'open source' (creative commons) since day one https://stackoverflow.com/help/licensing


Yeah it makes sense. When I ask something at StackOverflow, I don’t want AI hallucinated sludge as the reply.

Or if there is an AI hallucinated sludge, I want it clearly marked as such


A general guideline for rule crafting that I have seen been important in sport over and over and over again is:

Never disallow something you cannot enforce. Doing so causes people to have to make the choice of losing or being a liar. This is in part how cycling became such a mess, when there was an era where either you do EPO and lie, or retire. Because EPO was not allowed but could not yet be detected.


This is just the most recent decline in stackoverflow 8-(

Ever since the posting system was turned into a social score where people are mostly conncerned with increassing their score versus answering questions, stackoverflow has failed it's users.

Just another example in the very long list of for-profit plaforms doing what's best for profit over what's best for the users...


> The problem with AI-generated content

It sounds like the author's main problem with AI answers is that they are purely textual, and the AI is unable to verify them. Does it mean that the stance will change if the AI bot is able to run its code before submitting the answer? Aren't there already AI agents available that can do just that?


The goal of stack exchange is to create a repository of question answer pairs of knowledge. The problem here is that generational AI isn't able to create these new pairs. Either it is regurgitating existing knowledge or it is making up a potential answer. Repeat knowledge in the repository is discouraged and is why questions get marked as duplicate. Making up answers is problematic because it can be hard to verify and it lowers the quality of the repository. An AI being able to run code doesn't mean it is able to verify a solution.

Stack Exchange is not about answering people's questions. There can exist a different site for that and for a site like that it could make sense for current LLM to help.


> The goal of stack exchange is to create a repository of question answer pairs of knowledge. The problem here is that generational AI isn't able to create these new pairs.

I'm not so sure about that. I think we are quite close to an AI being able to put two and two together to create something marginally new. Maybe not an LLM by itself, but with some high-level iterative/recursive process like this: https://arxiv.org/pdf/2305.10601.pdf.

What I'm saying is that no-AI policy maybe made sense in late 2022, but will not necessarily hold in late 2023.


This is not about ChatGPT posts ultimately. Its about the new owners of stackoverflow reducing expenses until they run it into the ground


How viable is a Wikipedia like approach to StackExchange (and reddit, given the ongoing drama) where a non-profit takes over governance?


Proposed as a wiki project:

https://meta.wikimedia.org/wiki/Wikiask/Introduction

It never got anywhere. If it ever does, I would consider being a contributor.


Moderating based on "is it AI" never made a lot of sense, and SE is correct to pull the plug on it. It's a similar problem to the (possibly apocryphal) story of the national parks trying to design bear-proof trash cans—"There is a considerable overlap between the intelligence of the smartest bears and the dumbest tourists." Any anti-AI policy will inevitably catch a bunch of people who just aren't great at English, or are still in high school, or any number of other reasons why their prose might sound stilted. There is no function that can reliably separate the two overlapping probability curves, so trying to do so is pointless.

Moderating based on non-constructive comments makes much more sense—if someone posted 30 answers in one hour and none of them answer the question, moderate that. In practice nothing changes, you're still banning people who are abusing AI, but you're doing so based on a concrete quality metric rather than a flawed algorithm plus gut check.

Obligatory xkcd: https://xkcd.com/810/


> ChatGPT, for example, doesn’t understand the responses it gives you; it simply associates a given prompt with information it has access to and regurgitates plausible-sounding sentences

I don't like this tone, in the sense that suggesting what ChatGPT does is "simply" and "regurgitates" feels like a biased interpretation of what the tech is doing.

I think it is fair to say: ChatGPT is very good at creating convincing appearing content that is also incorrect. Validating content that appears high-quality in form but is actually low-quality in content is a major challenge for moderators. Banning suspicious accounts that moderators believe are spamming ChatGPT based responses is easier than individually validating each and every post from such accounts manually. As the number of posts from ChatGPT backed accounts multiplies this would become increasingly difficult and time consuming.

I understand their pain and I hope they find a solution. But my gut tells me that whatever solution they come up with will be forced to tolerate some amount of GPT generated content.


Moderation is likely something that ChatGPT would be very good at.


I hope the moderators depart from SO if they don't get what they want. If they are inclined, they could make their own website to compete with SO.


Stack would benefit from this experiment of moderators doing as little as possible. That is overall the ideal moderator to begin with.


I await the people complaining about how they were banned from SO for posting an answer the mods falsely believed was AI-generated.


Good, stay in strike forever, now I can finally post without been harassed by entitled moderators. SO is a toxic forum because of the moderators, not the OP's


The snake eats itself.

This data will all be fed right back into GPTX at some point leaving an even worse version of ai to generate future data to ingest again, ad infinitum.


Just to be clear - are these moderator people who are unpaid?

And if so, is this end-stage capitalism? Surely unpaid workers effectively unionizing because they want to do unpaid work on their own terms must qualify as end-stage capitalism...?


its sad to see such hand-crafted communities organize so well only to be ignored by the website they communicate over. and this happens with the stackexchange a lot..

sounds like an excellent time to make an alternative


TL;DR: They want the internal AI policy given to moderators to be revealed to the community, and in general want a more open/equal relationship with SO management.

Maybe it's because I'm not a native English speaker but I can't really figure out if they are against or for AI answers?

They say ChatGPT is s parrot leading to poor quality, so I assume they are against, but Stack Overflow did indeed ban ChatGPT messages? Then they say AI detectors have many false positives, so I guess they are against strong filtering? So what are they for then? This piece needs a tl;dr or bullet list... So yeah I asked our large language friend:

Stances from the text:

    A general moderation strike is being initiated.
    The strike is in protest of recent and upcoming changes to policy and the platform by Stack Exchange, Inc.
    Striking community members will refrain from moderating and curating content.
    Critical community-driven anti-spam and quality control infrastructure will be shut down.
    The new policy on AI-generated content is harmful and overrides community consensus.
    There has been a serious failure to communicate on the part of Stack Exchange, Inc.
    AI-generated content poses risks to the integrity of the platform and represents an honesty issue.
    Stack Exchange, Inc. has ignored the needs and consensus of the community and made decisions without consulting those most affected.
    The striking users want the AI policy change to be retracted or modified to address concerns and empower moderators.
    They want the internal AI policy given to moderators to be revealed to the community.
    Clear and open communication from Stack Exchange, Inc. regarding policy changes is demanded.
    Collaboration with the community instead of fighting it is expected.
    Stack Exchange, Inc. should be honest about the company's relationship with the community.
    A change in leadership philosophy toward the community is needed.
    Leadership should allocate resources based on community needs and involve the community in feature development.
    Neglecting and mistreating volunteers can lead to a decrease in goodwill and motivation.
    The concerns laid out in the open letter and the post should be addressed to end the strike.
Imho this is the key thing: They want the internal AI policy given to moderators to be revealed to the community, and in general want a more open/equal relationship with SO management.


All mods are bastards


This isn't a strike lol. It's a boycott.


Strike is the right word for workers refusing to work for particular reasons. Boycott pertains to users, not workers.


I hope Wikipedia suffers similar fate as well given that they had failed in addressing the Poland Holocaust distortions.


I for one welcome our new robot overlords. I'll step up to help clear the queues.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: