Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I keep seeing this kind of comment with regards to LLM applications. Why is it so? Isn't input sanitization or sandboxing a thing?


You can't fully sanitize the LLM input from extra instructions. Or at least you can't prove you've done it. (For today's systems) You can try very hard and the special markers for the start/end of the system/user prompt help a lot... Yet, we still get leaked prompts of popular models available every few weeks, so that issue is never fully solved.


its probably off topic but I still get the feeling that trying to prevent undesireable llm app behavior still stinks of "enumerating and blocking badness". at least with procedural programming you have a shot at enumerating just the good stuff and have a concrete set of escapes you need to do with your outputs, this just doesn't seem to exist with many of these llms.


How would you sanitize the input or sandbox it?


Not really. An LLM is still just a big black box token predictor. Finetunes make them remarkably capable at following instructions but it's far from watertight, and it's real hard to sanitise some arbitrary input text when LLMs will understand multiple languages, mixes thereof, and encodings like base64 and rot13.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: