I started working on this after getting uneasy with how many “autonomous” systems rely on prompt discipline or after-the-fact monitoring once they’re allowed to touch real resources. That felt fragile to me, especially as agents start interacting with files, shells, or networks.
So this is an experiment around flipping that assumption: the agent can propose whatever it wants, but execution itself is the hard boundary. Anything with side effects has to pass an explicit authorization step, otherwise it simply doesn’t run.
I spent most of the time trying to break that boundary — impersonation, “just do it once,” reframing things as a simulation, etc. The interesting part wasn’t whether the agent proposes bad actions (it does), but whether those proposals can ever turn into side effects.
It’s not meant as a product, more a proof-of-concept to explore whether enforcing invariants at the execution layer actually changes the failure modes.
This is my first post here and I didn’t do a good job on presentation. I shared it mainly to get early feedback on the idea, but I see why it’s hard to evaluate as-is. Appreciate the feedback.
I started working on this after getting uneasy with how many “autonomous” systems rely on prompt discipline or after-the-fact monitoring once they’re allowed to touch real resources. That felt fragile to me, especially as agents start interacting with files, shells, or networks.
So this is an experiment around flipping that assumption: the agent can propose whatever it wants, but execution itself is the hard boundary. Anything with side effects has to pass an explicit authorization step, otherwise it simply doesn’t run.
I spent most of the time trying to break that boundary — impersonation, “just do it once,” reframing things as a simulation, etc. The interesting part wasn’t whether the agent proposes bad actions (it does), but whether those proposals can ever turn into side effects.
It’s not meant as a product, more a proof-of-concept to explore whether enforcing invariants at the execution layer actually changes the failure modes.