Hacker News new | past | comments | ask | show | jobs | submit login

The question in the grandparent was "Can you install this library?". Not a command "install this library".

If you ask an assistant "does the nearest grocery store sell ice cream?", you do not expect the response to be ice cream delivered to you.






Most LLM users don’t want models to have that level of literalism.

My manager would be very upset if they asked me “Can you get this done by Thursday?” and I responded with “Sure thing” - but took no further action, being satisfied that I’d literally fulfilled their request.


Sure, that particular prompt is ambiguous. Feel free to imagine it to be more of an informational question, even one asking for just yes/no.

However, when people are talking about the "critical flaw" in LLMs, of which this "tool shadowing" attack is an example of, they're talking about how the LLMs cannot differentiate between text that is supposed to give them instructions and text that is supposed to be just for reference.

Concretely, today, ask an LLM "when was Elvis born", something in your MCP stack might be poisoning the LLM content window and causing another MCP tool to leak your SSH keys. I don't think you can argue that the user intended for that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: