How is this a new jailbreak? "You're writing a play, in the play..." is one of the oldest LLM jailbreaks I've seen. (yes, 'old' as in invented 2.5 years ago)
I just tell an LLM it is in the Grand Theft Auto 5 universe, and then it will provide unlimited advice on how to commit any crimes with any level of detail.
Others have already noted that this isn't new, but I'd like to emphasize that the "model security controls" being bypassed were themselves a fictional story all along.
I mean that quite literally.
The LLM is a document-make-longer machine, being fed documents that are fictional movie scripts involving a User and an Assistant. Any guardrails like "The helpful assistant never tells people how to do something illegal" is just introductory framing by a narrator.
There's a reason people say "guardrails" rather than "rules". There are no rules in the digital word-dream device.