FYI this was unveiled at the 2025 Google Next conference, and they're apparently unveiling a gRPC client for Rapid Storage, which appears to be a very thin wrapper over Colossus itself, as this is just zonal storage.
I kind of thought you meant ZNS / https://zonedstorage.io/ at first, or it's more recent better awesomer counterpart Host Directed Placement (HDP). I wish someone would please please advertize support for HDP, sounds like such a free win, tackling so many write amplification issues for so little extra complexity: just say which stream you want to write to, and writes to that stream will go onto the same superblock. Duh, simple, great.
They charge $20/TB/month for basic cloud storage. You can build storage servers for $20/TB flat. If you add 10% for local parity, 15% free space, 5% in spare drives, and $2000/rack/month overhead, then triple everything for redundancy purposes, then over a 3 year period the price of using your own hard drives is $115/TB and google's price is $720. Over 5 years it's $145 versus $1200. And that's before they charge you massive bandwidth fees.
I like your comparison with self-built storage, but comparing $20/TB/month with other CLOUD offerings, we see:
* hetzner storage box starts from $4/month for 1TB, and then goes down to $2.4/TB/month if you rent a 10TB box.
* mega starts from €10/month for 2TB, and goes down to €2/TB/month if you get a 16TB plan
* backblaze costs (starts from?) $6/TB/month
I was looking for a cheap cloud storage recently, so have a a list of these numbers :)
Moreover, these are not even the cheapest one. The cheapest one I found had prices starting from $6.5 for 5TB, going down to $0.64/TB/month for plans starting with 25TB (called uloz, but I haven't tested them yet).
Also, looking at lowendbox you can find a VPS in Canada with 2TB storage for $5/month and run whatever you want there.
How all that compares to $20/TB/month?!
Please feel free to correct me if i'm comparing apples to oranges, though. But I can't believe all of these offers are scam or so-called "promotional" offers which cost companies more than you pay for it.
I'm still annoyed they increased the price for B2. Maybe "free" bandwidth gets people to use it more? But as far as their costs go, between the time they launched at $5 and the time they upped it to $6, hard drives (and servers full of hard drives) cost half as much per TB, with 1/4 as many servers needed for the same number of TB.
I get the impression that business has always been about being the best schmoozer more than about having the best product.
BTW at Hetzner you can rent servers with very large (hundred of TB) non-redundant storage for an effective price of about $1.50/TB/month. If you want to build a cloud storage product, that seems like a good starting point - of course, once you take into account redundancy, spare capacity, and paying yourself, the prices you charge to your customers will end up closer to the price of Backblaze at a minimum.
>I get the impression that business has always been about being the best schmoozer more than about having the best product
and thus, market efficiency feels like a myth. This feels most true when it comes to cloud services. They're way overpriced in multiple different common cases at the big providers
That's covered by the build and overhead numbers. But if you want more on the build side, an extra $10k of labor per rack of 9 servers only increases the cost per TB by about $4.
Not to mention that I can:
- Create a bucket and store 1MB in it without any overhead
- Create 50 buckets with strong perimeters around them such that someone deleting the entire account doesn’t bring down the other 49
- Create a bucket and fill it with terabytes of data within seconds and don’t need to wait for hardware to be racked and stacked
- Create a bucket, fill it with 2TB of data, and delete it tomorrow
Cloud is more than bare metal, but plenty of folks discount the cost benefits of elasticity.
I suspect the problem is that we're engineers in domains that have very different needs.
For example, I agree that elasticity is great. But at the same time, to me, it sounds like bad engineering. Why do you need to store terabytes of data and then delete it - couldn't it be processed continuously, streamed, compressed, process changes only, and so on. A lot of engineering today is incredibly wasteful. Maybe your data source doesn't care, and just provides you with terabyte csv files, and you have no choice, but for engineers that care about efficiency, it reeks.
It might make a lot of sense in a highly corporate context where everything is hard, nobody cares, and the cost of inefficiency is just passed on to the customer (i.e. often government and tax payers). But the real problem here is that customers aren't demanding more efficiency.
Alone the fact of audit gives you a lot of reasons to keep data. Even if it gets downsampled one way or the other.
And plenty of use cases have natural growth. I do not throw away my pictures for example.
Data also grows dependent of users. More users, more 'live' data.
We have such a huge advantage with digital, we need to stop thinking its wasteful. Everything we do digital (pictures, finance data, etc.) is so much more energy and space efficient than what we had 20 years ago, we should just not delete data because we feel its wasteful.
Leverage erasure encoding for durability and avoid both the tripling and local parity. You'll get better durability than 3x while only taking up significantly less than 2x the space Backblaze open sourced their library and talk about it here, https://www.backblaze.com/blog/Reed-Solomon. They use a 17:20 ratio that'll get them 3 drive failure resistance for just 1.17x stretch (ie a 100mb file gets that resilient while taking up 117mb of space)
"Zonal" relates to the concept of "availability zones" which are the next-smallest unit below a (physical) "region."
Most instances of a cloud ___ created in a region are allocated and exist at the zonal level (i.e. a specific zone of a region).
A physical "region" usually consists of three or more availability zones, and each zone is physically separated from other zones, limiting the potential for foreseeable disaster events from affecting multiple zones simultaneously. Zones are close enough networking-wise to have high throughput and low latency interconnection, but not as fast as same-rack, same-cluster communications.
Systems requiring high availability (or replication) generally attain this by placing instances (or replicas) in multiple availability zones.
Systems requiring high-availability generally start with multi-zone replication, and Systems with even higher availability requirements may use multi-region replication, which comes at greater cost.
> Struggling to find a definition, but seemingly zonal just means there's a massive instance per cluster.
There are a number of zones in a region. Region usually means city. Zone can mean data center. Rarely just means some sort of isolation (separate power / network).
In Google Cloud parlance, "regional" usually means "transparently master-master replicated across the availability zones within a region", while "zonal" means "not replicated, it just is where it is."
Slight nit: "zonal" doesn't necessarily mean "not replicated", it means that the replicas could all be within the same zone. That means they can share more points of failure. (I don't know if there's an official definition of zonal.)
I mean, sure, it can easily provide quick text summaries of this sort of thing, but I only consume ML summaries in the forms of podcast discussions between two simulated pundits, as God intended.
That link doesn't work for me, so here's the relevant bit:
Rapid Storage: A new Cloud Storage zonal bucket that enables you to colocate your primary storage with your TPUs or GPUs for optimal utilization. It provides up to 20x faster random-read data loading than a Cloud Storage regional bucket.
(Normally we wouldn't allow a post like this which cherry-picks one bit of a larger article, but judging by the community response it's clear that you've put your finger on something important, so thanks! We're always game to suspend the rules when doing so is interesting.)
Apologies! First time making a post on hacker news, and I thought this was really exciting news. FWIW, I talked to the presenter after this was revealed during the NEXT conference today, and he seems to have implied that zonal storage is quite close to what Google seems to have with Colossus.
reply