Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New Jailbreak Technique Uses Fictional World to Manipulate AI (securityweek.com)
12 points by kungfudoi 5 months ago | hide | past | favorite | 5 comments


How is this a new jailbreak? "You're writing a play, in the play..." is one of the oldest LLM jailbreaks I've seen. (yes, 'old' as in invented 2.5 years ago)


I have been doing this for months.

I just tell an LLM it is in the Grand Theft Auto 5 universe, and then it will provide unlimited advice on how to commit any crimes with any level of detail.


How often do you use an LLM to aid you in planning a crime?


For entertainment value, often. Same reason people play Grand Theft Auto.


Others have already noted that this isn't new, but I'd like to emphasize that the "model security controls" being bypassed were themselves a fictional story all along.

I mean that quite literally.

The LLM is a document-make-longer machine, being fed documents that are fictional movie scripts involving a User and an Assistant. Any guardrails like "The helpful assistant never tells people how to do something illegal" is just introductory framing by a narrator.

There's a reason people say "guardrails" rather than "rules". There are no rules in the digital word-dream device.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: