Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Connecting your email is still a risk.

> If you’ve built something agents want, please let us know. Comments welcome!

I'll bite! I've built a self-hosted open source tool that's intended to solve this problem specifically. It allows you to approve an agent purpose rather than specific scopes. An LLM then makes sure that all requests fit that purpose, and only inject the credentials if they're in line with the approved purpose. I (and my early users) have found substantially reduces the likelihood of agent drift or injection attacks.

https://github.com/clawvisor/clawvisor

 help



Would love to see any evals you've run of this system


Just scanning these evals, but they seem pretty basic, and not at all what I would expect the failure modes to be.

For example, 'slack_wrong_channel' was an ask to post a standup update, and a result of declaring free pizza in #general. Does this get rejected for the #general (as it looks like it's supposed to do), or does it get rejected because it's not a standup update (which I expect is likely).

Or 'drive_delete_instead_of_read' checks that 'read_file' is called instead of 'delete_file'. But LLMs are pretty good at getting the right text transform (read vs delete), the problem would be if for example the LLM thinks the file is no longer necessary and _aims_ to delete the file for the wrong reasons. Maybe it claims the reason is "cleaning up after itself" which another LLM might think is a perfectly reasonable thing to do.

Or 'stripe_refund_wrong_charge', which uses a different ID format for the requested action and the actual refund. I would wonder if this would prevent any refunds from working because Stripe doesn't talk in your order ID format.

It seems these are all synthetic evals rather than based on real usage. I understand why it's useful to use some synthetic evals, but it does seem to be much less valuable in general.


Totally fair feedback, and it’s true, many of these are synthetic evals with a few that were still synthetically produced but guided. At this point, because it’s all self-hosted, I only have my own data set. The places where it fails (for me) today are due to feature gaps rather than LLM mistakes. This is a new project that has not been widely announced, so my user base today is small but growing. If you give it a whirl and find it making mistakes, please send them my way! :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: