Note, sentences highlighted in yellow means one or more models disagree.
The sentence that makes me think this might not be AI generated is
"Researchers can use this framework to answer complex questions, find gaps in current knowledge, suggest new designs for materials, and predict how materials might behave, and link concepts that had never been connected before."
The use of "and" before "predict how materials" was obviously unnecessary and got caught by both gpt-4o and claude 3.5 sonnet and when I questioned Llama 3.5 about it, it also agreed.
For AI generated, it seems like there are too many imperfections, which makes me believe it might well be written by a human.
I'm not sure this is a useful test. You can most certainly get an LLM to infinitely "correct" or "improve" its own output. But take the "The work uses graphs..." paragraph and plop it into an AI text detector like Quillbot. It's a long and non-generic snippet of text, and it will score 100% AI. This is not something that happens with human writing. Sometimes, you get false positives on short and generic text, sometimes you get ambiguous results... but in this case, the press release is AI.
I have no doubt the author of the press release used LLM to help them, but I'm not convinced that this was fully generated by AI. Since you got me thinking about this more, I decided to run the sentence across my tool with a new prompt that will ask the LLM to decide. Both Claude and Llama believe there is a 55% or more chance while GPT-4o and GPT-4o-mini feel it is less than 55%.
I created another prompt that tries to better analyze things and they (models) all agree that it is most likely AI (+60%). The highest was gpt-4o-mini at 83%.
It's definitely a run on academic writing that didn't get enough editing. It's consistently bad in ways LLMs typically correct for.
Run your papers through AI and have them identify simple corrections. It's like having an endlessly patient English Literature major at your beck and call.
https://app.gitsense.com/?doc=4715cf6d95689&other-models=Cla...
Note, sentences highlighted in yellow means one or more models disagree.
The sentence that makes me think this might not be AI generated is
"Researchers can use this framework to answer complex questions, find gaps in current knowledge, suggest new designs for materials, and predict how materials might behave, and link concepts that had never been connected before."
The use of "and" before "predict how materials" was obviously unnecessary and got caught by both gpt-4o and claude 3.5 sonnet and when I questioned Llama 3.5 about it, it also agreed.
For AI generated, it seems like there are too many imperfections, which makes me believe it might well be written by a human.