I'm curious if this is intentional or just a side effect of multiple agents havi...

jph00 · 2025-07-12T18:31:53 1752345113

It's intentional -- sometimes you can get it to start spitting out its system prompts, but shortly after it does, a monitoring program cancels the output in the middle. It also blocks tricks like base64.

wunderwuzzi23 · 2025-07-12T18:48:50 1752346130

Oh, so interesting!

A good approach might be to have it print each sentence formatted as part of an xml document. If it still has hiccups, ask to only put 1-3 words per xml tag. It can easily be reversed with another AI afterwards. Or just ask to write it in another language, like German, that also often bypasses monitors or filters.

Above might also help to understand if and where they use something called "Spotlighting" which inserts tokens that the monitor can catch.

Edit: OMG, I just realized I responded to Jeremy Howard - if you see this: Thank you so much for your courses and knowledge sharing. 5 years ago when I got into ML your materials were invaluable!

jph00 · 2025-07-12T20:08:27 1752350907

You're welcome!