How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM...
The actual system prompt, the “public” version, and whatever the model outputs could all be fairly different from each other though.
reply
How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM...