ChaosDB Explained: Azure's Cosmos DB Vulnerability Walkthrough

skj · on Nov 21, 2021

When we designed the security model for Google Cloud Build (I do not work there anymore), we decided that containers were not valid security barriers. So, all partitioning was done on the VM and network (configured outside the VM) level.

It wasn't hard to convince anyone that this was the right way to handle things.

someuname · on Nov 21, 2021

Why are they not?

dastbe · on Nov 21, 2021

not the op but aws made the same determination. the tl;dr is that the surface area of containerization leads to an unacceptable risk of privilege escalation.

someuname · on Nov 21, 2021

That explains what, but not why

native_samples · on Nov 22, 2021

Containers were never actually designed to be sandboxes, and inside you have access to many system calls and a comparatively huge surface area inside the kernel and userland, all written in C, with a long history of local root exploits due to C based bugs.

withinboredom · on Nov 22, 2021

Because if you can get root in a container, you have root outside the container. While escaping a container isn’t exactly easy or always possible, it is a huge risk.

native_samples · on Nov 21, 2021

That is completely insane. Getting root on one container = complete access to the entire system with administrator level access? What kind of security operation are they running there exactly? Local root exploits aren't exactly unheard of, so you'd think the infrastructure would be designed to tolerate that sort of thing, not simply hand out private keys to management APIs to all and sundry.

beardedwizard · on Nov 21, 2021

What kind of development operation is the question I would ask. Security mostly involves convincing developers to do the right thing with a lot of resistance. Not sure I would assume the security team is behind this, rather than some "risk acceptance" forced on them to launch the feature on time.

citizenpaul · on Nov 21, 2021

Tom This feature is due friday at 4pm but we want to review it so we need it done by lunch. Dont forget to make it secure.

Sure its as secure as i was givem time to test security none and none. Thanks Tom great work. See ya at 12.

This is more the reality than resistance.

beardedwizard · on Nov 24, 2021

Why won't Tom include this in his estimates to begin with? Did Tom forget?

xbar · on Nov 21, 2021

A close reading of the DevSecOps infinity lifecycle has lots of security+development touchpoints that yield better software security than this.

The trade-off you mention is a management decision, not really a development or security decision.

maltalex · on Nov 21, 2021

I had several absolutely awful experiences with CosmosDB even before this breach. Its design and engineering are the worst I've encountered on Azure or anywhere else that I remember.

This vulnerability, and especially its handling by Microsoft, were the final nail in the coffin for us and we've put in the effort to migrate away.

Ducki · on Nov 21, 2021

Could you elaborate on the design?

maltalex · on Nov 21, 2021

Sure. Here are a few examples. They're all based on my experience with the MongoDB API for CosmosDb. Your mileage with other APIs may vary.

1. CosmosDB has a hardcoded 60-second timeout for queries. That means that queries that take longer than that are literally impossible to run without breaking the query into smaller chunks. This is worse than it sounds because CosmosDB doesn't have some of the basic optimizations that exist in other databases. For example, finding all distinct values of an indexed field required a full scan which wasn't doable in 60 seconds. Another example is deleting all documents with a specific value in an indexed field - again, not doable in 60 seconds. When deleting or updating multiple documents, we'd write short snippets of code that queried for the ids of all documents that need to be updated, and then updated or deleted them by id one by one.

2. Scaling up and down again can cause irrevocable performance changes since there's a direct link between the number of provisioned RUs and the number of "Physical Partitions" created by the database. A new "physical partition" is created for every 10K RUs or 50GB of data. CosmosDB knows how to create new physical partitions when scaling up, but doesn't know how to merge them when scaling down.

Say you have 10 logical partitions on 5 physical partitions, and you are paying for 50K RUs. Each physical partition holds exactly 2 logical partitions and is allocated 10K RUs. Now you had to temporarily scale up the database for some reason to 100K RUs, so you have 10 physical partitions with one logical partition on each one. When you scale back to 50K RUs, you'll still have 10 logical partitions, each with 5K. So now each of your logical partitions has exactly 5K RUs, while before it had 10K RUs shared with a different logical partition.

3. The allocation of logical partitions to physical partitions is static, hash-based and there's no control over it. This means that having hot logical partitions is a performance problem. Hot logical partitions might end up on the same physical partition and be starved for resources while other physical partitions are over-provisioned. Of course, you can allocate data to partitions completely randomly and hope for the best, but there's a performance penalty for querying multiple logical partitions. Plus, updates/deletes are limited to a single logical partition, so you'll be losing the ability to batch update/delete related documents.

4. Index construction is asynchronous and very slow because it uses some pool of "free" RUs that scale off the RUs allocated to your collection. It used to take us over 12 hours to build simple indexes on a ~30GB collection. Also, if you issue multiple index modification commands they will be queued even if they cancel each other out. So issuing a "create index" command, realizing you've made a mistake, then issuing a "drop index" followed by another "create index" is a 24-hour adventure. Over the next 12 hours the original index will be created, then immediately dropped, and created again. To top it off, there's no visibility into which indexes are being used, and the commands for checking the progress of index construction were broken and never worked for us.

Cpoll · on Nov 21, 2021

> RUs

I've also run into a situation with obtuse/unexpected/excessive pricing. We were spiking a Mongo -> Cosmos migration and with our Cosmos configuration we were getting charged separately for each (empty) collection our app created.

Each collection was billed separately for provisioned throughput even though they were unused, and since the app created 50~ collections the cost added up pretty quickly before I noticed. Note: There is a way to turn this off afair, but the description of the setting made me decide to select it for a prod-like database.

In reality it's partly my fault. I should have kept a close eye on the costs until I was sure my mental model was correct. I also knew CosmosDB was just offering a Mongo API and wasn't actually hosted Mongo, so I should have been more vigilant about the differences in implementation. And of course I should have RTFM — although, even when I noticed the problem, it took me a long time and a support ticket to find the explanation in the docs.

maltalex · on Nov 21, 2021

> I should have RTFM — although, even when I noticed the problem, it took me a long time and a support ticket to find the explanation in the docs

I haven't raised it as an issue since it's not a design issue. But the CosmosDB documentation has been a constant source of pain for us too. Important details were often not mentioned, mentioned in unexpected places, or were downright wrong.

rawgabbit · on Nov 21, 2021

Thank you for the detailed write up. I did a cursory investigation of CosmosDB but didn’t know it was this bad.

dagss · on Nov 21, 2021

It is much worse than this this was just the start of a very long list of awful problems (my team tried to live with Cosmos for a couple of years too,luckily got approved a port to another DB and now we are not looking back).

The worst thing is Microsoft reps would always recommend the database, Microsoft support (who came onsite to us to talk with us about Cosmos) had a hard time acknowledging what should be obvious problems and so on... so it took a while for us to catch on that it actually is as beyond awful as it is.

maltalex · on Nov 21, 2021

Glad to hear we're not the only ones.

There were times when I felt like I was part of some elaborate prank or social psychology experiment to see how developers would react to an obviously broken product. Trivial use cases were, and probably still are utterly broken. As if no one before us ever tried to use them.

Still, I'd love to hear about your experience with cosmos and the issues you guys discovered.

bowmessage · on Nov 21, 2021

From what I remember of my time in AWS, DynamoDB also suffers from 2. Not sure if that's changed, since.

reese_john · on Nov 21, 2021

  > August 17 2021 - MSRC awarded $40,000 bounty for the report.

I don't know much about the bug bounty industry, is this the typical payout from what it seems to be a pretty severe vulnerability?

technion · on Nov 21, 2021

The part that's contraversial about the MS bounties is that they stopped covering the majority of on-premise products.

For example, person that reported the two major Microsoft Exchange vulnerability chains received no payout at all.

Ref: https://i.blackhat.com/USA21/Wednesday-Handouts/us-21-ProxyL...

junon · on Nov 21, 2021

MS is pretty bad about vulnerability bounties. I reported one about privilege escalation on WSL1 and received no response plus it was patched within the following months. Was a bit aggravated.

xbar · on Nov 21, 2021

Truly horrible, despite tooting their horn about it a very awful lot.

Veserv · on Nov 21, 2021

Yes. A competent researcher deciding to investigate their products can reasonably be expected to find a new, unique vulnerability of this severity given a few weeks to maybe a month of effort. At a standard consulting rate that amounts to a few thousand to low tens of thousands of dollars, so a $40,000 payout is a fair amount for the amount of effort and difficulty to find a bug that totally invalidates the security of a major service by a top, world class cloud provider.

wrren · on Nov 21, 2021

Not a great comparison. A researcher does not work for a company, a researcher is basically a mercenary; they'll go where the money is. If a company pays too little relative to its peers, or fails to pay out too often due to fine-print in the T&Cs, researchers will go elsewhere. That's a serious loss for Microsoft given the kinds of vulnerabilities and their impact that a researcher can find if they're focusing on your platform.

lukeasrodgers · on Nov 21, 2021

I believe the last sentence in the parent comment was (top notch) sarcasm.

vasco · on Nov 21, 2021

Yes, quite large. Next HN will say how many billions they could afford to pay because of all the potential damage. If it's hard to make sense, the security guard at the bank doesn't get paid a % of the money if they stop a robbery and many other examples off payout being way less than potential damage caused.

junon · on Nov 21, 2021

That's a bad faith comparison, and I don't really see the relation between the two.

shlosky · on Nov 21, 2021

The funny thing is the founder of Wiz is formerly the head of Microsoft Israel, and many many ex-Microsoft are in Wiz. I wonder if the knowledge about Microsoft internals helped them finding this vulnerability.

notimetorelax · on Nov 21, 2021

It might have, but ex-microsofties is a large set. Googling tells me Microsoft employs 181k people. Assuming 3% leave each year gives us 5k. Of course not all of them are software engineers.

mrweasel · on Nov 21, 2021

There’s facinating number of places where, if implemented correctly, this attack could have been prevented.

Given that much of attack is related to rhings not exclusive it CosmosDB, firewall, internal service and certificate, it’s likely that other services may be at risk as well.

Generally, because so many flaws are involved, this cannot be easy to fix.

cerved · on Nov 21, 2021

omar_kha · on Nov 21, 2021

You know some ms product manager thought they were "winning" when they included Jupyter notebook.