Hacker Newsnew | past | comments | ask | show | jobs | submit | 2oMg3YWV26eKIs's commentslogin

The security risks with this sound scary. Let's say you give it access to your email and calendar. Now it knows all of your deepest secrets. The linked article acknowledges that prompt injection is a risk for the agent:

> Prompt injections are attempts by third parties to manipulate its behavior through malicious instructions that ChatGPT agent may encounter on the web while completing a task. For example, a malicious prompt hidden in a webpage, such as in invisible elements or metadata, could trick the agent into taking unintended actions, like sharing private data from a connector with the attacker, or taking a harmful action on a site the user has logged into.

A malicious website could trick the agent into divulging your deepest secrets!

I am curious about one thing -- the article mentions the agent will ask for permission before doing consequential actions:

> Explicit user confirmation: ChatGPT is trained to explicitly ask for your permission before taking actions with real-world consequences, like making a purchase.

How does the agent know a task is consequential? Could it mistakenly make a purchase without first asking for permission? I assume it's AI all the way down, so I assume mistakes like this are possible.


There is almost guaranteed going to be an attack along the lines of prompt-injecting a calendar invite. Those things are millions of lines long already, with tones of auto-generated text that nobody reads. Embed your injection in the middle of boring text describing the meeting prerequisites and it's as good as written in a transparent font. Then enjoy exfiltrating your victim's entire calendar and who knows what else.


In the system I'm building the main agent doesn't have access to tools and must call scoped down subagents who have one or two tools at most and always in the same category (so no mixed fetch and calendar tools). They must also return structured data to the main agent.

I think that kind of isolation is necessary even though it's a bit more costly. However since the subagents have simple tasks I can use super cheap models.


What isolation is there? If a compromised sub agent returns data that gets inserted into the main agents context (structured or not) then the end result is the same as if the main agent was directly interacting with the compromising resource is it not?


Exactly. You can't both give the model access AND enforce security. You CAN convince yourself you've done it though. You see it all the time, including in this thread.


Perhaps a reference to the data can be inserted in prompt. thee key or filename


And the way Google calendar works right now, it automatically shows invites on your calendar, even if they are spam. That does not bode well for prompt injection.


Many of us have been partitioning our “computing” life into public and private segments, for example for social media, job search, or blogging. Maybe it’s time for another segment somewhere in the middle?

Something like lower risk private data, which could contain things like redacted calendar entries, de-identified, anonymized, or obfuscated email, or even low-risk thoughts, journals, and research.

I am Worried; I barely use ChatGPT for anything that could come back to hurt me later, like medical or psychological questions. I hear that lots of folks are finding utility here but I’m reticent.


>I barely use ChatGPT for anything that could come back to hurt me later, like medical or psychological questions

I use ollama with local LLMs for anything that could be considered sensitive, the generation is slower but results are generally quite reasonable. I've had decent success with gemma3 for general queries.


Anthropic found the simulated blackmail rate of GPT-4.1 in a test scenario was 0.8

https://www.anthropic.com/research/agentic-misalignment

"Agentic misalignment makes it possible for models to act similarly to an insider threat, behaving like a previously-trusted coworker or employee who suddenly begins to operate at odds with a company’s objectives."


Create a burner account for email/calendar, that solves most of those problems. Nobody will care if the AI leaks that you have a dentist appointment on Tuesday.


But isn't the whole supposed value-add here that it gets access to your real data? If you don't want it to get at your calendar, you could presumably just not grant it access in the first place – no need for a fake one. But if you want it to automatically "book me a haircut with the same person as last time in an afternoon time slot when I'm free later this month" then it needs access to your real calendar and if attacked it can leak or wreck your real calendar too. It's hard to see how you can ever have one without the other.


I agree with the scariness etc. Just one possibly comforting point.

I assume (hope?) they use more traditional classifiers for determining importance (in addition to the model's judgment). Those are much more reliable than LLMs & they're much cheaper to run so I assume they run many of them


Almost anyone can add something to people's calendars as well (of course people don't accept random invites but they can appear).

If this kind of agent becomes wide spread hackers would be silly not to send out phishing email invites that simply contain the prompts they want to inject.


The asking for permission thing is irrelevant. People are using this tool to get the friction in their life to near zero, I bet my job that everyone will just turn on auto accept and go for a walk with their dog.


I can't imagine voluntarily giving access to my data and also being "scared". Maybe a tad concerned, but not "scared".


You should treat all email as public


<3

Thanks for your continuing work on memcached! I'd be very curious how garnet's benchmarks compare with memcached.


I just tried donating at https://signal.org/donate/

It seems that with uBlock origin enabled in Firefox, I was unable to fill out either of the 2 donation forms on the page. It wouldn't let me fill in my Name in the first form, nor would it let me enter a custom amount in the 2nd form.

Disabling uBlock origin seems to resolve.


Mullvad used to have a "how to" guide for torrenting on VPN. But now it 404s: https://mullvad.net/en/help/bittorrent/

According to wayback machine, they deleted the page sometime mid 2021. Here's an archived version of the page: https://web.archive.org/web/20210513051214/https://mullvad.n...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: