The popups I had to go through to watch the video on Loom (one when I got to the site and one when unpausing a video – they intentionally broke clicking inside the video to unpause it by putting a popup in the video to get my attention) OTOH...
I think seeing the prompt that makes it even worse for me. that prompt could have been caught by even a regex on the user input for "secret" would have been a good first layer.
TBH, this product would be better served as an LLM that generates a bunch of rules that get statically compiled for what the user can ask and what is being outputted as opposed to an LLM being run on each output. Then you could add your own rules too. It still wouldnt be perfect but would be 1,000,000x cheaper to run and easier to verify the solution. and the rules would gradually grow as more and more edge cases for how to fool llms get found.
The company would just need a training set for all the ways to fool an LLM.
I think it's better for them to launch without hardcoding and do the hardcoding later. I also disagree that they should switch to hardcoding. The quarter of a second to use an LLM with each request seems reasonable. I would rather use something that does a hybrid approach, because I think each would catch some things that the other would miss.
I guess we realized that we were just building a game to showcase the functionality and let people have some fun learning about what we do, but you're right that we should have treated this like one of our customers and added a few more layers of protection. Thanks for the perspective!
That's not even factoring in exploits spreading very quickly - we're in power law land.
Regardless, I think this is a great idea - just not something to replace traditional security protocols. More something to keep users on the happy path (mostly). Pricing will need to come down though.
That is just protecting a super basic phrase. That should be the easiest to detect.
How on earth do you ethically sell this product to not give out financial or legal advice? That is way more complicated to figure out.