Very cool! This makes Google the only major cloud that has low-latency single-zone object storage, standard regional object storage, and transparently-replicated dual-region object storage - all with the same API.
For infra systems, this is great: code against the GCS API, and let the user choose the cost/latency/durability tradeoffs that make sense for their use case.
Sure, but AFAIK S3’s multi-region capabilities are quite far behind GCS’s.
S3 offers some multi-region replication facilities, but as far as I’ve seen they all come at the cost of inconsistent reads - which greatly complicates application code. GCS dual-region buckets offer strongly consistent metadata reads across multiple regions, transparently fetch data from the source region where necessary, and offer clear SLAs for replication. I don’t think the S3 offerings are comparable. But maybe I’m wrong - I’d love more competition here!
I claimed that Google is the only major cloud provider with all three of:
- single-zone object storage buckets
- regional object storage buckets
- transparently replicated, dual region object storage buckets
I agree that AWS has two of the three. AFAIK AWS does not have multi-region buckets - the closest they have is canned replication between single-region buckets.
Isn't S3 Express not the same API? You have to use a "directory bucket" which isn't an object store anymore, as it has actual directories.
To be honest I'm not actually sure how different the API is. I've never used it. I just frequently trip over the existence of parallel APIs for directory buckets (when I'm doing something niche, mostly; I think GetObject/PutObject are the same.)
The cross-region replication I’ve seen for S3 (including the link you’ve provided) is fundamentally different from a dual-region GCS bucket. AWS is providing a way to automatically copy objects between distinct buckets, while GCS is providing a single bucket that spans multiple regions.
It’s much, much easier to code against a dual-region GCS bucket because the bucket namespace and object metadata are strongly consistent across regions.
The semantics they are offering are very different from S3. In Colossus a writer can make a durable 1-byte append and other observers are able to reason about the commit point. S3 does not offer this property.
FYI this was unveiled at the 2025 Google Next conference, and they're apparently unveiling a gRPC client for Rapid Storage, which appears to be a very thin wrapper over Colossus itself, as this is just zonal storage.
I kind of thought you meant ZNS / https://zonedstorage.io/ at first, or it's more recent better awesomer counterpart Host Directed Placement (HDP). I wish someone would please please advertize support for HDP, sounds like such a free win, tackling so many write amplification issues for so little extra complexity: just say which stream you want to write to, and writes to that stream will go onto the same superblock. Duh, simple, great.
They charge $20/TB/month for basic cloud storage. You can build storage servers for $20/TB flat. If you add 10% for local parity, 15% free space, 5% in spare drives, and $2000/rack/month overhead, then triple everything for redundancy purposes, then over a 3 year period the price of using your own hard drives is $115/TB and google's price is $720. Over 5 years it's $145 versus $1200. And that's before they charge you massive bandwidth fees.
I like your comparison with self-built storage, but comparing $20/TB/month with other CLOUD offerings, we see:
* hetzner storage box starts from $4/month for 1TB, and then goes down to $2.4/TB/month if you rent a 10TB box.
* mega starts from €10/month for 2TB, and goes down to €2/TB/month if you get a 16TB plan
* backblaze costs (starts from?) $6/TB/month
I was looking for a cheap cloud storage recently, so have a a list of these numbers :)
Moreover, these are not even the cheapest one. The cheapest one I found had prices starting from $6.5 for 5TB, going down to $0.64/TB/month for plans starting with 25TB (called uloz, but I haven't tested them yet).
Also, looking at lowendbox you can find a VPS in Canada with 2TB storage for $5/month and run whatever you want there.
How all that compares to $20/TB/month?!
Please feel free to correct me if i'm comparing apples to oranges, though. But I can't believe all of these offers are scam or so-called "promotional" offers which cost companies more than you pay for it.
I'm still annoyed they increased the price for B2. Maybe "free" bandwidth gets people to use it more? But as far as their costs go, between the time they launched at $5 and the time they upped it to $6, hard drives (and servers full of hard drives) cost half as much per TB, with 1/4 as many servers needed for the same number of TB.
I get the impression that business has always been about being the best schmoozer more than about having the best product.
BTW at Hetzner you can rent servers with very large (hundred of TB) non-redundant storage for an effective price of about $1.50/TB/month. If you want to build a cloud storage product, that seems like a good starting point - of course, once you take into account redundancy, spare capacity, and paying yourself, the prices you charge to your customers will end up closer to the price of Backblaze at a minimum.
>I get the impression that business has always been about being the best schmoozer more than about having the best product
and thus, market efficiency feels like a myth. This feels most true when it comes to cloud services. They're way overpriced in multiple different common cases at the big providers
That's covered by the build and overhead numbers. But if you want more on the build side, an extra $10k of labor per rack of 9 servers only increases the cost per TB by about $4.
Not to mention that I can:
- Create a bucket and store 1MB in it without any overhead
- Create 50 buckets with strong perimeters around them such that someone deleting the entire account doesn’t bring down the other 49
- Create a bucket and fill it with terabytes of data within seconds and don’t need to wait for hardware to be racked and stacked
- Create a bucket, fill it with 2TB of data, and delete it tomorrow
Cloud is more than bare metal, but plenty of folks discount the cost benefits of elasticity.
I suspect the problem is that we're engineers in domains that have very different needs.
For example, I agree that elasticity is great. But at the same time, to me, it sounds like bad engineering. Why do you need to store terabytes of data and then delete it - couldn't it be processed continuously, streamed, compressed, process changes only, and so on. A lot of engineering today is incredibly wasteful. Maybe your data source doesn't care, and just provides you with terabyte csv files, and you have no choice, but for engineers that care about efficiency, it reeks.
It might make a lot of sense in a highly corporate context where everything is hard, nobody cares, and the cost of inefficiency is just passed on to the customer (i.e. often government and tax payers). But the real problem here is that customers aren't demanding more efficiency.
Alone the fact of audit gives you a lot of reasons to keep data. Even if it gets downsampled one way or the other.
And plenty of use cases have natural growth. I do not throw away my pictures for example.
Data also grows dependent of users. More users, more 'live' data.
We have such a huge advantage with digital, we need to stop thinking its wasteful. Everything we do digital (pictures, finance data, etc.) is so much more energy and space efficient than what we had 20 years ago, we should just not delete data because we feel its wasteful.
Leverage erasure encoding for durability and avoid both the tripling and local parity. You'll get better durability than 3x while only taking up significantly less than 2x the space Backblaze open sourced their library and talk about it here, https://www.backblaze.com/blog/Reed-Solomon. They use a 17:20 ratio that'll get them 3 drive failure resistance for just 1.17x stretch (ie a 100mb file gets that resilient while taking up 117mb of space)
"Zonal" relates to the concept of "availability zones" which are the next-smallest unit below a (physical) "region."
Most instances of a cloud ___ created in a region are allocated and exist at the zonal level (i.e. a specific zone of a region).
A physical "region" usually consists of three or more availability zones, and each zone is physically separated from other zones, limiting the potential for foreseeable disaster events from affecting multiple zones simultaneously. Zones are close enough networking-wise to have high throughput and low latency interconnection, but not as fast as same-rack, same-cluster communications.
Systems requiring high availability (or replication) generally attain this by placing instances (or replicas) in multiple availability zones.
Systems requiring high-availability generally start with multi-zone replication, and Systems with even higher availability requirements may use multi-region replication, which comes at greater cost.
In Google Cloud parlance, "regional" usually means "transparently master-master replicated across the availability zones within a region", while "zonal" means "not replicated, it just is where it is."
Slight nit: "zonal" doesn't necessarily mean "not replicated", it means that the replicas could all be within the same zone. That means they can share more points of failure. (I don't know if there's an official definition of zonal.)
> Struggling to find a definition, but seemingly zonal just means there's a massive instance per cluster.
There are a number of zones in a region. Region usually means city. Zone can mean data center. Rarely just means some sort of isolation (separate power / network).
I mean, sure, it can easily provide quick text summaries of this sort of thing, but I only consume ML summaries in the forms of podcast discussions between two simulated pundits, as God intended.
This could actually speed up some of my scientific computing (in some cases, data localization/delocalization is an important part of overall instance run-time). I will be interested to try it.
Glad to see the zonal object store take off. Such massive bandwidth speed will re define data analytics where 99% of all queries able to run on a single node faster than what distributed compute can offer.
This link makes so much more sense than the previous link did.
SSDs with high random I/o speeds are a significant contributor to the advantage. I think 20m writes per second are likely distributed over a network of drives to make that kind of speed possible.
Is S3 Express One Zone performance greatly improved to standard S3 like GCP rapid storage? My understanding is S3 Express One Zone is just more cost effective.
> 20x faster random-read data loading than a Cloud Storage regional bucket.
Update: Just read this article[1] which clarifies S3 Express One Zone. Yes, performance is greatly improved, but actually storage costs are 8x more than a standard S3 bucket. The naming S3 Express One Zone is terrible and a bit misleading on pricing changes.
I understand your belief that One Zone implies less expensive, but I’m staunchly in favor of them having it in the name so people know that their data is in a single AZ. The storage class succinctly summarizes faster with lower availability.
Fair, how about instead of S3 Express they call it S3 Max (One Zone). It doesn’t take a rocket scientist to come up with good product names, just copy Apple. Though I suppose what happens when engineers are left up to the marketing. :-)
Yep, I love Apple, follow them closely, own a Mac Studio with an M3 Ultra and a MacBook Pro with an M4 Max, and it's still confusing. :)
I mean, surely a Mac Studio with an M4 Max must be the best, right? It's an entire CPU generation ahead and it's maximum! Of course, it's not... the M3 Ultra is the best.
That link doesn't work for me, so here's the relevant bit:
Rapid Storage: A new Cloud Storage zonal bucket that enables you to colocate your primary storage with your TPUs or GPUs for optimal utilization. It provides up to 20x faster random-read data loading than a Cloud Storage regional bucket.
(Normally we wouldn't allow a post like this which cherry-picks one bit of a larger article, but judging by the community response it's clear that you've put your finger on something important, so thanks! We're always game to suspend the rules when doing so is interesting.)
Apologies! First time making a post on hacker news, and I thought this was really exciting news. FWIW, I talked to the presenter after this was revealed during the NEXT conference today, and he seems to have implied that zonal storage is quite close to what Google seems to have with Colossus.
Anywhere Cache and Rapid Storage share some infrastructure inside of GCS and both are good solutions for improving GCS performance, but Anywhere Cache is an SSD cache in front of the normal buckets while Rapid Storage is a new type of bucket.
Anywhere Cache shines in front of a multi-regional bucket. Once the data is cached, there's no egress charges and there's much better latency. This is great for someone who looks for spot compute capacity to run computations anywhere in the multi-region. It will also improve performance in front of regional buckets but as a cache, you'll see the difference between hits and misses.
Rapid Storage will have all of your data local and fast, including writes. It also adds the ability to have fast durable appends, which is something you can't get from the standard buckets.
Everyone needs to learn to use a single, unique, unambiguous URL for new product announcements like this.
Google aren't the only company that consistently mess this up, but given how they built a 1.95 trillion company on top of crawling URLs on the web they really should have an internal culture that values giving things unique URLs!
[I had to learn this lesson myself: I used to blog "weeknotes" every week or two where I'd bundle all of my project announcements together and it sucked not being able to link to them as individual posts]
Google's not really at fault here: the OP submitted a link to an article called "Introducing Ironwood TPUs and new innovations in AI Hypercomputer" that happens to mention Rapid Storage way down the page.
They're not saying that it's AI. They're saying it's for customers who do AI. Training means lots and lots of reads from a big data store, and if you're reading from, like, big Parquet files, that probably means lots of random reads. This is for that. Speedier data access, presumably at the cost of durability and availability, which is probably a great trade-off for people doing ML training jobs.
>if you're reading from, like, big Parquet files, that probably means lots of random reads
and it also usually means that you shouldn't use s3 in the first place for workloads like this. Because they are usually very inefficient comparing to distributed fs. Unless you have some prefetch/cache layer, you will get both bad timings and higher costs
But a distributed FS is far more expensive than cloud blob storage would be, and I can't imagine most workloads would need the features of a POSIX filesystem.
I don't fault them for this at all. AI isn't possible without the full infra stack, which clearly includes storage (and compute, and networking, and data pipelining, and and and...). There's an entire ecosystem of ISVs that only do one of these things, very well (Pure Storage, for example, or Lamba or Coreweave, or Confluent (Kafka + Flink with LLM integration). While it might be more precisely accurate to state "AI enabling" tech, I'll give them a pass.
I think the joke here is that somehow management refused to sell Colossus (which is such an obvious nice product just like BigQuery) before and it takes "AI" to convince them.
> which is such an obvious nice product just like BigQuery
I always assumed (from outside Google) that the problem was that Colossus had to make a "no malicious actors" assumption in its design in order to make the performance/scaling guarantees it does; and that therefore just exposing it directly to the public would make it possible for someone to DoS-attack the Colossus cluster.
My logic was that there's actually nothing forcing [the public GCP service of] BigTable to require that a full copy of the dataset be kept hot across the nodes, with pre-reserved storage space — rather than mostly decoupling origin storage from compute† — unless it was to prevent some DoS vector.
As for exactly what that DoS vector is... maybe GC/compaction policy-engine logic? (AFAICT, Colossus has pluggable "send compute to data" GC, which internal-BigTable and GCS both use. But external-BigTable forces the GC to be offloaded to the client [i.e. to the BigTable compute nodes the user has allocated] so that the user can't just load down the system with so many complex GC policies that the DC-scale Colossus cluster itself starts to fall behind its GC time budget.)
---
† Where by "decouple storage from compute", I mean:
• Each compute node gets a fixed-sized DAS diskset, like GCE local NVMe SSDs;
• each disk in that diskset gets partitioned up at some fixed ratio, into two virtual disksets;
• one virtual diskset gets RAID6'ed or ZFS'ed together, and is used as storage for non-Colossus-synced tablet-LDB nursery level SSTs;
• the other virtual diskset gets RAID0'ed or LVM-JBOD-ed together and is used as a bounded-size LFU read-through cache of the Colossus-synced tablets — just like BigQuery compute nodes presumably have.
(AFAIK the LDB nursery levels already get force-compacted into "full" [128MiB] Colossus-synced tablets after some quite-short finality interval, so it's not like this increases data loss likelihood by much. And BigTable doesn't guarantee durability for non-replicated keys anyway.)
You gotta feed the GPUs & TPUs with enough data to avoid them sitting idle. Which starts to become incredibly challenging with latest gen GPU/TPU chips
Obviously not, since you could not deliver it. It seems that you maybe don't realize what CFS is in this context, and are thinking of something else that you could just "set up"?
What jeffbee is talking about is Google's proprietary Colossus File System, and all its transitive dependencies.
I meant it sarcastically, but for "serious money" you can have any software system you can dream of. You have to dream of it, though - that's one of the hard parts.
It looks like every other clustered file system. What's special about Google's Colossus?
There are some semantic differences compared to POSIX filesystems. A couple big ones:
- You can only append to an object, and each object can only have one writer at the time. This is useful for distributed systems - you could have one process adding records to the end of a log, and readers pulling new records from the end.
- It's also possible to "finalize" an object, meaning that it can't be appended to any more.
Other systems don't offer the performance that Colossus offers, is why. POSIX has all kinds of silly features that aren't really necessary for every use case. Throwing away things like atomic writes by multiple writers allows the whole system to just go faster.
It sounds like you have to find a design that meets your performance target and usage patterns - just like anything else. It also sounds like Google's CFS is a grass is greener situation - you heard Google had something that solved the problem you have, so you want it. But the reason it sounds good, compared to the other designs, is that you haven't had to actually use it and run into its quirks yet.
Reading the press release about the "Hypercomputer" and I can't tell what part of this is real and what part is marketing.
They say it comes in two configuration, 256 chips or 9,216 chips. They also say that the maximal configuration of 9,216 chips delivers 24x the compute power of the world's largest supercomputer (which they say is called El Capitan). They say that this comes to 42.6 exaFLOPs.
This implies that the 9,216 chip configuration doesn't actually exist in any form in reality, or else it would now be the world's largest supercomputer (by flops) by a huge margin.
Am I massively misunderstanding what the claims being made are about the TPU and the 42.6 exaFLOPs? I feel like this would be much bigger news if this was fully legit.
Edit: The flops being benchmarked are not the same as regular supercomputer flops.
Supercomputers are measured based on 64 bit floating point operations. Here they (inaptly) compared it to their 8 bit floating point operations (which are only useful for AI workloads).
Gotcha. That makes a lot more sense. I was led to believe by the wording of the comparison that they were the same operations. Appreciate the explanation.
Also the set of supported/accelerated operations in the fastest path is different no matter whether you use 8, 16, or 32bit floats, thus the common use of "TOPS" as benchmark number recently.
Terrifyingly complicated and buzzword packed. I really don't know what to make of any of this or what it does, and I work with AI applications in my day job.
I'm guessing the $300 of Google Cloud credit offered in this webpage wouldn't go very far using any of this stuff?
Like with any other new Google product, better wait a few years to see if it sticks before investing in its usage. In most cases, you'd be better off searching for an alternative from the start.
If you want object storage faster than S3 Express One Zone or GCP Rapid Storage without the zonal limitation check out ACS: https://acceleratedcloudstorage.com
You can bring data in and out of the GPU quickly and improve utilization.