Hacker News new | past | comments | ask | show | jobs | submit login

This appears to be the same two authors who reported that "People cannot distinguish GPT-4 from a human in a Turing test" back in May 2024:

https://arxiv.org/pdf/2405.08007

That earlier result was because they botched the statistics, changing the test so it's no longer a binary comparison but still analyzing as if it was. They seem to have fixed that now, perhaps in response to reviewer feedback. This new preprint is the best LLM Turing test I've seen so far.

That said, their humans sure don't seem to be trying very hard. The most effective interrogator strategies ("jailbreak" and "strange") were also the least used. I don't think any of these models can fool a skilled human who's paying attention, though there's still practical use for a model that can fool an unskilled human who isn't (scams, etc.).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: