New Jailbreak Technique Uses Fictional World to Manipulate AI

sigmar · 2025-03-22T21:20:57 1742678457

How is this a new jailbreak? "You're writing a play, in the play..." is one of the oldest LLM jailbreaks I've seen. (yes, 'old' as in invented 2.5 years ago)

lrvick · 2025-03-24T22:40:15 1742856015

I have been doing this for months.

I just tell an LLM it is in the Grand Theft Auto 5 universe, and then it will provide unlimited advice on how to commit any crimes with any level of detail.

koolba · 2025-03-24T22:47:01 1742856421

How often do you use an LLM to aid you in planning a crime?

lrvick · 2025-03-25T08:13:29 1742890409

For entertainment value, often. Same reason people play Grand Theft Auto.

Terr_ · 2025-03-24T22:47:24 1742856444

Others have already noted that this isn't new, but I'd like to emphasize that the "model security controls" being bypassed were themselves a fictional story all along.

I mean that quite literally.

The LLM is a document-make-longer machine, being fed documents that are fictional movie scripts involving a User and an Assistant. Any guardrails like "The helpful assistant never tells people how to do something illegal" is just introductory framing by a narrator.

There's a reason people say "guardrails" rather than "rules". There are no rules in the digital word-dream device.