If no one else does it soon I'll probably do it myself: we're long overdue for t...

viraptor · 2024-07-25T18:58:05 1721933885

You can't detect LLM output with any reasonable rate. You'd have both false positives and false negatives all over the place. If you solve that part on its own, that will be a SOTA method.

benreesman · 2024-07-25T19:04:15 1721934255

This is a dangerous falsehood. OpenAI's since-cancelled polygraph had a 9% rate of false positives, and a 26% rate of true positive. If I can lose a quarter of toxic bytes and need to enable JavaScript on one site in ten? Count me in!

I want more false positives.

https://openai.com/index/new-ai-classifier-for-indicating-ai...

viraptor · 2024-07-25T19:18:20 1721935100

Then don't use any website - 100% false positives. But seriously, it's a 9% rate for specific models at the time. It's a cat and mouse game and any fine tuning or a new release will throw it off. Also they don't say which 9% was misclassified, but I suspect it's the most important ones - the well written articles. If I see a dumb tweet with a typo it's unlikely to come from LLM (and if it does, who cares), but a well written long form article may have been slightly edited with LLM and get caught. The 9% is not evenly distributed.

benreesman · 2024-07-25T22:12:01 1721945521

It was a cat and mouse game before, spam always is. The inevitable reality that spam is a slog of a war isn’t a good argument for giving up.

I don’t know the current meta on LLM vs LLM detector, but if I had to pick one job or the other, I’d rather train a binary classifier to detect a giant randomized nucleus sampling decoder thing than fool a binary classifier with said Markov process thing.

Please don’t advocate for giving up on spam, that affects us all.

dragonwriter · 2024-07-25T19:01:17 1721934077

> If no one else does it soon I'll probably do it myself: we're long overdue for the ad-block of LLM output. I want a browser plugin that nukes it at the DOM, and I don't care how many false positives it has.

Well, if you don't care how many false positives it has, just block everything. But there's no even remotely reliable way to detect LLM output if it isn't deliberately watermarked to facilitate that, so you aren't going to get anything that is actually good at that.