Hacker News new | past | comments | ask | show | jobs | submit login
GCP CloudSQL Vulnerability Leads to Internal Container Access and Data Exposure (dig.security)
200 points by ivmoreau on May 26, 2023 | hide | past | favorite | 51 comments



I'm pretty impressed with the GCP response, both the fact that they identified the behavior and took the first step in reaching out.


I'm going to take a guess that reading files like /etc/shadow are 'tripwires', which trigger a review by an engineer.

With seccompbpf it's pretty simple to have systemwide tripwires on certain files/syscalls/network operations. Even if the attacker gains root, your tripwire will probably alert you before they can disable it.


The other way to see it, is that it took them 8 days to notice a full compromise of the hosting OS and an open access to Google’s internal docker image repository URL.


I'm going to guess that this VM was considered the 'customers' VM as far as security goes... Ie. you couldn't access any other customers data.

Likewise, GCP Dataflow quite trivially allows you to escape onto the worker machines and take the (huge) binaries that implement it. They have some really nice detailed status pages!


I was part of GCP Cloud Dataflow team a few years ago. The status page is actually the standard for all google internal services (/statusz). I still miss them much.

In dataflow's case, container is not treated as the boundary. And there are several important things to note:

- Dataflow's VMs are in customer projects, so there's no risk of cross-tenant access.

- When launching dataflow jobs, the launcher identity is checked to have iam.serviceAccountUser IAM role, which means that the identity should be able to launch a VM with the same service account just fine. So dataflow is not escalating the permission beyond GCE VMs.

- Just as VM launched by someone, if anyone else can log onto those VMs are controlled separately.

- Container is used in dataflow only for convenient image delivery, not for a security barrier. VM is.


Containers should never be treated as a security boundary.


Yes. They don't want you to be able to poke around but the real security boundary is the VM, not the database server.


Back when there was a critical Azure bug that enabled an Azure user to gain access to top-level keys (i.e. the keys to the entire kingdom), a Google engineer commented on an HN thread that Google specifically didn't consider container boundaries secure, so everything is always tied to a VM specific to a customer. The issue with Azure is that a container escape allowed a user to take over the entire Azure subsystem.


It's not a mistake unique to Azure, Alibaba had a vulnerability make the news rounds recently where container escapes led to cross tenant access.

There's two types of cloud providers, the ones who take security seriously and the ones who learn security the hard, public way.

I'm a bit surprised that Azure would get lumped in with the other cut-rate providers but that's becoming more and more obvious with the vulnerabilities of the past few years.


Not sure if this is still true RE: Azure. AFAIK they use Hyper-V (hypervisor) containers which offer kernel isolation like other lightweight-VM-container runtimes.


Hyper-V has been around for a while, did Azure just not get the memo until recently?


I work for Google but mostly interface with GCP the same way everyone else does.

The vms are somewhat hidden in the UI iirc but otherwise you can enumerate them via API and ssh to them and debug/profile (which I was doing to get cross-language profiling on data flow pipelines with py-spy and jvm perf output).

It's just a worker vm in your project.


The hosting OS is all but certain to be virtualized. It's no different from customers creating a GCE VM in the first place.


It took 8 days to proactively reach out. It may very well have been identified earlier and then taken some time to be passed off to Google's vulnerability reward program and get any approvals necessary


To start getting info from the team, as nothing indicates that at that time, Google knew where the vulnerability was.


So this blog post is missing any information about what the actual vulnerabilities were. What was the "gap"? What was the misconfiguration? Also missing is whether access to the host VM exposes meaningful secrets. Does this actually risk customers' sensitive data?


It’s marketing for their other products. A pretty annoying read.


Yeah this was terrible.

First, we did a privilege escalation.

How? They don't say.

Next, we did another privilege escalation.

And how?? They don't say.

what's the point of this


Also no details about what severity the vulnerability was assessed as. For all we know they got a $10 Play Store voucher because the security boundary is the VM, and SQL customers are already paying for the VM and the rest is convenience so they are considered to be hacking themselves here. Reading this was a waste of time.


There's a big fat NDA attached to the reward.


maybe security researchers would be well advised to establish a kind of name and shame culture for this NDA with benefits thing that mainly serves to protect corporate interests.


They skipped all the interesting parts.


Last time I checked, their hosted databases run in dedicated VMs, which is where the real security boundary is.

Getting access to the host OS won't give you much other than some internal binaries and config.


I was part of the Cloud SQL team when putting databases in VMs was designed (previously the MySQL process was run in a sandbox).

I am sure Cloud SQL is far more advanced since then (9 years ago), but security in depth was something we thought about a lot. Running in a VM for each database rather than a multi-tenant system was for security more than anything else. We could have multi-tenanted just as easily implementation-wise.


Not a huge fan of Google, but I have always admired how they prioritise security.

This would never fly at Amazon because it would cost them a few cents to have anorher VM. Microsoft would probably not even notice the issue.


> This would never fly at Amazon because it would cost them a few cents to have anorher VM.

That is categorically false. Not only does Amazon's RDS do that (can't find where they say that, might have been at reinvent one year) but for other services like Fargate they used to waste way more resources due to instance single tenancy, until they adopted Firecracker: https://d1.awsstatic.com/events/reinvent/2019/CON423-R1_REPE...


Of course, I might have been wrong.

But isn't this for dedicated containers and not VMs?


The point is that their container offering recognizes, correctly, that containers aren't a secure isolation boundary so unless there's internal only ec2 instance sizes (which seems unlikely, but I could be wrong) they used to waste significant portions of an instance's compute in the name of security since the instance _is_ a secure boundary.

More broadly, based on the literature I've seen, I'd agree that GCP takes security seriously, but so does AWS and I haven't seen any good evidence to say one would be "better" than the other.

I would expect both to come up with a robust security model and as part of their defense in depth I'd expect both to enforce single tenancy at a hypervisor level any time they're running anything untrusted or which can be materially/declaratively influenced by customers (e.g. code, SQL, etc)


All AWS RDS databases run on a dedicated VM.


There is a probably a good reason why they didn't elaborate on this:

"Our research began when we identified a gap in GCP’s security layer that was created for SQL Server."

It would have been interesting to see how they identified that security gap.


It reads like paint two circles... then the rest of the owl.


This article is lacking the actual interesting bit, which is how was the escalation achieved? Just reads like bragging instead of being informative.


I don’t know why, but I was disappointed they didn’t disclose how much the reward was.


Per their published table, not more than $13,337

https://bughunters.google.com/about/rules/6625378258649088/g...


Hopefully not very much... They were 'caught' by googles security team.

Who knows - if Google hadn't detected the intrusion, this attack might be on the black market by now.


Probably not. There's no coherent market for serverside vulnerabilities of any sort.


"With access to the operating system, we managed to find some internal Google URLs related to the docker image repository. We could also access the internal repo which later was fixed and the access from non internal IPs was blocked."

Fascinating how sloppy some people are when they set up infrastructure even though this may be down to bad defaults.


The vulnerability sounds like it's inherent to SQL Server, and that cloud providers haven't been successful in blocking the underlying problem due to its proprietary nature.

Presenting it as a Cloud SQL problem is disingenuous.


No? From the article:

> we identified a gap in GCP’s security layer that was created for SQL Server. This vulnerability enabled us to escalate our initial privilege and add our user to the DbRootRole role, a GCP admin role.

So Google took proprietary software not designed for this use-case and built their own security layer on top of it and ended up with bugs.

Of course that's an issue with the service. Presenting it as anything else than an issue in Cloud SQL seems disingenuous.


Remember that MS SQL server isn't Google code... Any vulnerabilities it may contain they might be powerless to fix.

Considering that, Google probably has an extensive monitoring system running in the VM, looking for things happening that shouldn't happen... And they have probably also built a filtering infrastructure between the users and the SQL server so that if any vulnerability is found, they can at least filter attempts to exploit it while a fix is being made.


According to the blog post, the vulnerability is not within SQL Server itself, the vulnerability is in the security layer that Google built on top of SQL Server in order to offer it as a managed service on GCP.


Isn’t the blur effect too light on the screenshots? I may be possible to recompute the /etc/shadow file.


And what would that accomplish? Knowing the contents of /etc/shadow of a random (virtual) machine that belonged to someone else that you could not access, one that most likely already ceased to exist.


It’s still a bad practice to blur information that supposed to be hidden.


Oh boy someone's not going to have a fun long weekend


As the article says, the vulnerability was fixed in April and the people who discovered it have already been rewarded under Google's Vulnerability Reward Program. Google also proactively detected the problem before being notified by the researchers.


It's already been resolved by Google and is not exploitable, so yes hopefully sysadmins using SQL Server on CloudSQL will indeed have an actually fun long weekend.


It's responsibly disclosed after the hole is patched.


The term of art is "coordinated" disclosure. All sorts of disclosures, with or without vendor consent, can be "responsible", so we try not to use that term, which was coined as a device to give vendors power over researchers.


As a customer, I'm glad that both the vendor and the researcher are acting responsibly


But I got my pitchfork out and everything!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: