"Lots of people in CS are (almost surely) GPT-ing their peer reviews"

neilv · on March 19, 2024

With the current abysmal cultural trends around "AI"-based cheating and shirking...

Maybe we're going to need a way for people to signal "If you send LLM-generated text to me under the pretext that you wrote it, I will (depending on context) urge that you be sanctioned by the academic/professional organization, refer you to academic discipline committee for suspension or expulsion, have you put on a PIP or fired, or downrate your friendship level."

ben_w · on March 19, 2024

I'm sure a lot of people are already doing that.

I doubt this response will ever become dominant, in part because the tool is already really useful despite its limitations.

I'm not sure how it's going to shake out, given the tools are not likely to be static.

fragmede · on March 19, 2024

Disclaimer: No ChatGPT was used for this post.

What if people used a disclaimer to actively signal, ChatGPT was used for the English here, at the top/bottom of emails/whatever when it was used. would it be as much of a problem for you then?

nicbou · on March 19, 2024

I think that the disclaimer should be the other way around. It should be assumed that the message comes from the person you are talking to.

Nonetheless I've been tempted to add an "AI-free" or "written by a human" logo on my website. There just isn't a place where it would make any sense.

fragmede · on March 19, 2024

As I see it, the three places would be your header or footer, or on your about page. I see it as the same as a disclaimer that a post is using affiliate links. So maybe also as optional text on a particularly long form piece of text. Eg this comment of mine isn't long enough to warrant a disclaimer (though I did not use ChatGPT), but for something more longwinded, a maybe we're at a place we're a disclaimer makes sense. Maybe just pick one and feel it out. The disclaimer/disclosure good both ways though, I think. Telling people AI was used in the generation of the post cheapens it for anti-AI purists, but not everyone thinks that way. I personally am okay reading AI generated text, since I chat with ChatGPT on my own projects and for my own knowledge already, so I don't see why it would be so different encountering it in the wild, so to speak.

Okay, and now I've written enough that it turns out I feel like adding a disclaimer: this comment written by a human, with no help from an AI.

fragmede · on March 19, 2024

then again, if I start saying no ChatGPT was used on a post in other contexts, then the implication is that I'm using ChatGPT when I don't give a disclaimer, which is not where I want to be.

politelemon · on March 19, 2024

https://notbyai.fyi/ don't know if that's the one you meant

nicbou · on March 19, 2024

I'm pretty sure that's the one. Thanks!

pylua · on March 19, 2024

Sent from my aiphone.

devwastaken · on March 19, 2024

Did you just write an LLM generated comment? I will refer you to the disciplinary committee that my cousin also happens to be on. /s

harry8 · on March 19, 2024

"The grad students that CS profs exploit and who are tasked with doing the review the prof claims s/he has done appear to be using GPT."

We know the exploitation is quite prevalent and we should keep saying so out loud every time there is an opportunity to do so.

thadt · on March 19, 2024

Maybe, but this can be done really well. Probably the best class I had in grad school was a bunch of grad students reviewing papers. We would meet once a week, and the professor would go over the paper, we would discuss our reviews, and he would probe our thinking. Sure, he probably was able to use some of our thoughts, but he did his part in challenging us, and we learned a lot by doing.

I know some professors definitely can be exploitative, but learning how to do peer review well is a critical skill that can really only be developed by doing it - with good mentoring.

harry8 · on March 19, 2024

The existence of humans who do not suck is utterly irrelevant in calling out an industry where exploitative practises are prevalent. Some say they are the norm. Whatever your view of the proportion, reform is badly needed.

Good for you if you got lucky. Hurrah. You should be calling the exploitation out the loudest and most often on that basis, right? Otherwise it might erroneously appear that you're ok with your competition to secure an academic career being abused like this because it kinda works in your favor. And I'm sure you're not ok with it on that basis.

LudwigNagasena · on March 19, 2024

If you were curious how much "lots" is, here is a quote from the abstract:

> Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences [ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023] could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates.

burkaman · on March 19, 2024

Paper: https://arxiv.org/abs/2403.07183

thenewwazoo · on March 19, 2024

Hell, I’m GPT’ing my self reviews.

Solvency · on March 19, 2024

People demanding self reviews are already meatspace bots conditioned into forcing regular people into servile paperwork generation only to be sent into the void.

I will gladly utilize AI to feed these corporate drones.

Waterluvian · on March 19, 2024

Using GPT for self-review might lead to biased or inaccurate assessments. It's essential to seek feedback from human peers or experts for a more reliable evaluation.

itsgrimetime · on March 19, 2024

I think they meant self-assessments? Pass in your commit log and some bullet points about the projects and ship it

Waterluvian · on March 19, 2024

It's important to be cautious not to confuse comments generated by GPT with real ones. While GPT can provide helpful insights, it lacks the depth and nuance of human feedback. Always verify information from trusted sources to avoid misunderstandings or misinformation.

ben_w · on March 19, 2024

Nuance is one place where I think ChatGPT beats most of the humans I've interacted with. Only extremely nerdy people (which, in fairness, you'd hope be a defining trait of "people reviewing computer science papers") seem to do as well or better.

Depth, sure. ChatGPT is neat, but much to lazy for a full review — it tends to give me a handful of suggestions and then stop. I presume this is because the training set is humans on the internet and that in turn means "make 1-3 points in response to the previous commenter, then stop", not "mark this exhaustively according to a well-defined rubric, including for spelling and tone".

Now, that said, I've just re-read your two comments, and they absolutely pattern match to the kind of thing ChatGPT says. I'm curious, were either or both actually from ChatGPT, or has that style just become so prevalent that humans are mimicking it?

Der_Einzige · on March 19, 2024

It's always funny to see a joke repeatedly go woosh over the supposedly intelligent HN userbase. Thank you for the laugh!

ndespres · on March 19, 2024

It isn’t being read as a joke because the expectation on this forum is that folks are posting seriously and engaging in good faith. There are enough other comment sections on the net that are full of unfunny jokes that add little value to the conversation.

buffington · on March 19, 2024

It's being read as both.

See, the beauty in what we're now calling the joke, is that it's both a joke for those who know, and it's actual valid commentary that meets your expectations of this forum.

Waterluvian · on March 19, 2024

It’s not a joke. It’s commentary.

bdd8f1df777b · on March 19, 2024

Are you the GPT here?

itsgrimetime · on March 19, 2024

I don’t really see a problem with this - if someone does their due diligence and reads a paper thoroughly to provide feedback, and passes their draft through an LLM to make it easier to read or sound better, that’s fine; great even.

I guess it does make it easier for people to be lazy and get away with it. That being said, I’d be surprised if ChatGPT/Claude/Gemini actually provided _zero_ useful feedback.

Not exactly the same, but instead of my co-workers just pasting “LGTM” on everything, I’d probably prefer an LLM review.

al_borland · on March 19, 2024

It may start out that way, if we assume the best of people, but if it does a decent enough job, they will be less and less review as time goes on.

Most people can't even be bothered to read a function they copy off StackOverflow to understand what it does, it's not a stretch to believe they aren't going to bother reading a whole paper.

>Not exactly the same, but instead of my co-workers just pasting “LGTM” on everything, I’d probably prefer an LLM review.

You could do your own LLM review before sending it to your co-workers, if you'd like that kind of review. It may catch basic syntax stuff or suggest ways to write something differently, but it won't catch bigger issues of logic or "is this a good idea". I'll admit to a lot of LGTM reviews going around in my team, but last week I saw someone make something extreme dangerous and the review stopped it from rolling out to prod. I am almost 100% certain an LLM would have said LGTM, as it was written fine and did exactly what it said it would do... it was just a horrible idea to do that thing.

barbazoo · on March 19, 2024

It’s like grammarly for people that don’t want to pay for grammarly?

eyegor · on March 19, 2024

I'd argue llms can do a much better job, because they can also emulate styles in addition to cleaning up grammar. "that's too formal", "that's too casual", "rephrase this more concisely", etc. I've run into plenty of drafts or arxiv publications that are annoying to read because the sentence structure is unnatural or overly verbose.

golergka · on March 19, 2024

I use GPT in all my writing now, and I'm really transparent about it. For $20 a month, I now have a pocket secretary, and can at any time whipe out my phone and dictate a long, incoherent train of thought, thinking out aloud — and get a structured retelling of my thoughts a few seconds later. Is there a single reason not to use it?

harry8 · on March 19, 2024

You aren't organising your own thoughts. You're missing out on imporant clarification to your thinking and very likely settling for second rate on that basis. That's one reason not to use it, but you may be able to counter that somehow.

golergka · on March 19, 2024

Not all of my thinking needs that first rate. One thing I've used GPT for recently was writing letters to some officials on a beurocratic matter. I dictated my thoughts with a lot of babbling and curse words (the whole issue is frustrating) and got out beautifully written page in a language I barely speak.

Writing detailed, respectful letters to idiot officials is not a skill that I would enjoy getting or practicing.

ben_w · on March 19, 2024

> Is there a single reason not to use it?

Same reason as all the other tools when they were new: we're still all figuring out the best practices, the limitations, and the strengths.

Also: If all you have is a hammer, every problem looks like a nail — Outside my speciality, ChatGPT is my hammer; inside my speciality, I can see all the LLM's flaws while also having good working knowledge of many other tools.

rchaud · on March 19, 2024

> and get a structured retelling of my thoughts a few seconds later?

How bad is the latency in your actual brain that this is a regular option for you?

barbazoo · on March 19, 2024

What’s your pipeline? Do you dictate into a chat GPT or do you take a voice memo and go through the API or something like that?

ben_w · on March 19, 2024

FWIW, the app, even without a pro subscription, has speech-to-text and TTS. On iOS, you can directly start the app in voice input mode from the shortcuts menu.

justforasingle · on March 19, 2024

There are a lot of seething and butthurt replies to you for some reason. Imagine the above but in regards to any other piece of technology.

"I use calculators in all my calculations now, and I'm really transparent about it. For $20, I now have a pocket calculator, and can at any time whip it out and calculate a long, difficult problem - and get a correct calculation based on my inputs just a few seconds later. Is there a single reason not to use it?"

Are the concerns in the other replies consistent with their own outsourcing of willpower & mental effort to technology?

MichaelMoser123 · on March 19, 2024

Stanford says: we have a GPT4 based tool to help you draft a review - wonderful tool, it saves researchers tons of time.

https://hai.stanford.edu/news/researchers-use-gpt-4-generate...

Now it will be hard to tell, if a given review has been done exclusively by this drafting tool, or with some human input/oversight.

bcrl · on March 19, 2024

It makes one wonder if it would be in the public interest for there to be a legal requirement of generative AI providers to log what they have generated and make it searchable, along the lines of lawful access to telecommunications records.

al_borland · on March 19, 2024

This would be very useful for catching people using them to cheat, but would likely be problematic due to how many people are using them to generate things that should be private.

This would also surely create an LLM black market. One kid in school sets up his own LLM and then charges kids for access to write their papers with a system that won't log to the central DB.

aragonite · on March 19, 2024

As long as the reviewers have read and "signed off" on the GPT-generated drafts, this isn't fundamentally different from a president having a personal speechwriter. The reviewers are still responsible for ensuring the GPT-generated content accurately reflects their opinions of the paper, just as a president is vis-a-vis speechwriter-generated content.

Also, CS likely has a higher proportion of non-native English speakers compared to other fields. The increased frequency of certain tokens could be due to non-native speakers running their draft reviews through prompts like "Please edit this draft written by a non-native speaker, clean up infelicities and improve word choices where appropriate."

robocat · on March 19, 2024

They really need to limit study to reviewers that have English as a first language.

I presume people with English as a second language (a) lean on GPT more, (b) don't deeply reedit the output.

ghthor · on March 19, 2024

I tried and they turned out awful. Just wrote them myself and they were much better

bruce511 · on March 19, 2024

This is an interesting bit of data because it -is- just data and makes no judgement based on the data.

It leaves the reader to decide if it's a good or bad thing.

On the one hand scientists are not always good writers. Bring good at science does not imply good communication skills. Clearly tools like spell-checker and grammar-checker have been jn use for decades, is this not just the next iteration?

On the other hand LLMs are just probability machines. So while the review might accurately reflect the authors opinion, it may come across as "bland". All the personality of the reviewer is stripped away.

It's like fast-food. It's all the same. Which has up-sides. But it's considered "low quality" in part because it's "designed by focus group". By contrast Mamas Italian Kitchen reflects mamas unique personality.

Perhaps they are both OK, in their own way.