FRA1 Block Storage Issue

qertoip · on April 1, 2018

Can confirm.

We run 26 services in production on DigitalOcean. Every single VPS in our setup uses block storage feature as a persistence layer (system logs, app logs, databases, etc).

Thanks to this architecture we can rebuild machines at will. The _function_ and _state_ are nicely separated.

The downside is that we are now fucked.

mattbillenstein · on April 1, 2018

You should re-architect without network block storage as a requirement imho. Ever since that big AWS EBS outage in 2012 or whatever, I avoid it like the plague.

Databases like low-latency local storage (the newer nvme instances on aws are very good) and logs and whatnot are aggregated with other systems (fluent, logstash, etc). I actually do not miss EBS much at all really -- if I have a problem with a VM, they're disposable and redundant and I've designed out most of the SPOFs.

boulos · on April 1, 2018

Disclosure: I work on Google Cloud.

I wouldn't throw the baby out with the bathwater. Across Google, we rely almost exclusively on networked storage (Colossus) unless we need extremely high-performance local flash. Being able to separate compute from storage is a huge part of our ability to both scale out, as well as to do live migration for GCE (where Persistent Disk, our equivalent of EBS is built upon Colossus).

Persistent Disk has never had an outage like the EBS one, but I attribute that to the Colossus and underlying teams having run this at Google for a really long time. Fwiw, the AWS folks have also massively improved EBS over the years. You can still worry, but I prefer to think about overall MTBF rather than consider networked storage in particular as the plague :).

butwhythough · on April 1, 2018

Disclosure: I use both gce and ec2

Unfortunately unlike EBS, persistent disks (Colossus) in Google cloud share a network plane with the VM. To quote the docs.

"Each persistent disk write operation contributes to your virtual machine instance's cumulative network egress cap." https://cloud.google.com/compute/docs/disks/performance

If you haven't noticed the mtu in Google cloud is < 1500 vs AWS where you can get jumbo frames (9k). I have no reason to believe persistent disks are different.

Enabling live migrations? You mean the choice of having an instance terminated with zero notice or migrate with a 60 second warning if you subscribe to the right API? Oh yeah and this is happening constantly.

Live migrations (and by connection VM attrition), persistent disks, and network performance are my least favorite aspects of google cloud today.

That being said google cloud does have alot of advantages over ec2. These are just not them.

boulos · on April 2, 2018

First, I'm sorry you've had a bad time. We track our VM MTBF closely, so hearing "this is happening constantly" is really worrying. Feel free to reach out to me (email in profile), or Support so we can dig into your experience. If something is wrong, we should diagnose and fix it.

Can you say why you prefer explicitly separated egress caps? We let our networking egress be shared between all sources of traffic on purpose, because it lets you go full throttle rather than hardcap on "flavor". That is, why restrict someone that doesn't write to disk much by "stealing" several Gbps for the PD/EBS they aren't going to use?

Finally, it's true that our MTU is too damn low. But that also isn't particularly material for PD: when the guest issues a write, we handle it all behind the scenes (it's not like your guest sees the write get fragmented into packets).

butwhythough · on April 2, 2018

Thank you for the fresh perspective and the thoughtful reply. I also appreciate the offer to reach out. Rest assured we are actively engaged during the events and have found support incredibly responsive. At a high level none of these are reasons for us to stop investing in our google cloud stack, nor have they caused any major outages. Think of them as quality of life comments.

Live migrations

To clarify "this is happening constantly". I meant we see live migrations happen frequently through the day. Going back the last 24 hours I see "hundreds" of migrations. We do have days where we will see 3 or 4x this number. The majority of these were successful and our logging, probes, and graphs show nothing exciting.

On the positive side of things, we have noticed a marked improvement in migration times, probe failures, and instance fatalities in the last 6 months. Where before we would regularly see live migrations take upwards of 15 minutes or longer. They are now at or under 2 minutes (With only a handful of exceptions barely worth mentioning).

I do appreciate the facility of live migration and the proactive approach googles takes to host maintenance. The 60 second notification window is just too damn short for some of our services to properly drain themselves. So instead, we hold on to our butts and hope for the best on those boxes.

If there was one improvement to live migration it would be to have the option of a 15 minute (or even 30 minutes.. am I being greedy?) notice.

Networking egress caps shared between instance and persistent disk

The edge case that hurts here is when you have a high bandwidth service that also writes a lot of data to a PD disk and the comcast style burst bandwidth throttling that happens on the instances (This is pure speculation and may have improved since last time this was investigated. The observation at the time was the throttling is a bit too efficient and hit the instance disproportionately). We have since migrated to either local ssd or tmpfs disks for these type of hosts in gce. Their sister services are still running fine in ec2 on ebs instances.

MTU

Yay! 1500 would be better, 9k would be great. This hurts when connectivity starts to see an increase in packet loss and the corresponding increase in packet re-transmission (and latency). Not to mention the overhead these extra packets incur (Those 20-60+ bytes just in headers add up quick).

TLDR wishlist

Longer notification window before live migration actually starts PD-optimized instances 1500/9k MTU

cube2222 · on April 1, 2018

No, that's not that actually.

On google cloud platform your virtual machine usually gets migrated between machines without any downtime.

Aren't you talking about preemptible instances?

butwhythough · on April 1, 2018

One would think right? We are not running with preemptible instances.

Yes I have verified all the scheduling flags for our instances.

Preemptible: false OnMaintenance: MIGRATE (only other option is terminate which happens far too often for any sizable shop to use and you lose the 60 second warning)

Key word being usually. Even if it's not a complete failure we have seen all sorts of fun from packet loss/latency spikes to seg faults.

I have no doubt things will improve but right now pd, network, and live migrations are pain points for our group in gce.

sulam · on April 7, 2018

Full disclosure: evaluating GCP pretty heavily -- and can support what you're saying from conversations with existing customers (multiple). Live migrations cause both latency/cpu slowdowns (brownouts) and actual (short) interruptions (blackouts). This isn't even a secret, it's fairly well-documented:

https://cloud.google.com/compute/docs/instances/live-migrati...

mattbillenstein · on April 1, 2018

To each their own. I don't like getting paged at 3am, so I choose to remove complexity and external dependencies where I can. And not operating at Google-scale, this is one of those things I can live without.

ddorian43 · on April 1, 2018

You saying bigtable doesn't work with local disks ?

londons_explore · on April 2, 2018

Correct. Big table is all built on remote network filesystems, with certain stuff preloaded into ram.

joelhaasnoot · on April 1, 2018

Coming from an enterprise background for a couple of years: network block storage screws you one way or another either way. It's a single point of failure that no matter how reliable it is, will always go down at some point or have failures that affect your application layer.

antongribok · on April 1, 2018

That blanket statement is absolutely not true.

With Ceph you can distribute data across hosts, switches, racks, rows, fabrics, rooms, datacenters. If you design correctly, you can have a very resilient storage system.

Source: I come from enterprise storage and have been running Ceph in production (successfully) for ~5 years.

halbritt · on April 1, 2018

Ceph has some weird failure modes, though. Your radius of failure absolutely includes the entire cluster, regardless of how resilient its been made.

Sure, you're insulated from hardware failures, but not from bugs in the underlying system. Data loss is rare, but having a cluster go unavailable does happen.

antongribok · on April 1, 2018

> Ceph has some weird failure modes

No argument.

> radius of failure absolutely includes the entire cluster

Completely agree, same goes for any other storage system, network-based or otherwise.

I'm simply disagreeing with a blanket statement that all network block storage is unreliable, and that is simply not true.

Don't do silly things with Ceph, don't just have a single network fabric, don't buy cheap switches that drop packets on the floor at medium load, don't test things in production.

halbritt · on April 2, 2018

Oh, do test things in production.

Can’t understate the need for good network, especially with the traffic amplification of replication.

I was running 2x10GE and had plans to scale up the cluster network to 2x or 3x10GE.

My biggest issue was the MTTR of a failed cluster. If one is RBD mirroring to a different cluster, recovery time may be hours or days.

That said, I had one cluster run continuously for a couple years with basically zero administration. It had great performance and good reliability other than that one week-long service impacting incident as a result of a bug.

I’d definitely run it again.

llama052 · on April 1, 2018

Sounds like you've been in environments that have done network block storage incorrectly. If you do multi-path, redundant/replicated network storage with proper configurations in place, you shouldn't have a single point of failure. Ceph oddities aside.

Everything has issues given a long enough time span though. Each layer has an accounted level of risk, and it definitely is a balancing act.

notatoad · on April 2, 2018

Avoiding an entire class of storage technology because one vendor's implementation went down in 2012 seems a little questionable.

qertoip · on April 1, 2018

We are actually very happy with database performance over block storage.

As for the architecture, in DigitalOcean the external block storage is the only way to separate function from state. We need this to be able to rebuild VPS-es regularly. This is to avoid configuration drift and to prove/enforce reproducibility.

mattbillenstein · on April 1, 2018

I guess I've solved that in other ways -- configuration drift is really a config management and orchestration problem. And recycling or OS upgrades can happen if your story around replication is sound -- I only typically do that every two years anyway following the Ubuntu LTS releases.

pmlnr · on April 1, 2018

Add a cache layer for read-only content which doesn't invalidate in case of a failure like this, so it allows your system to go into read-only mode.

Local copies of puppet/ansible can be very useful as well.

For logs and metrics, I'm not aware of an out of the box solution that could replay from the last, successfully transmitted line; this is something rsyslog, graphite, etc. could certainly benefit from. (Please let me know if you are aware of these kinds of buffers.)

However, distributed network block storage is usually something unavoidable after a certain size; local filers are really expensive.

jlgaddis · on April 1, 2018

FWIW, rsyslog can queue up log messages locally if the central server it is sending to is down (and will send them when it comes back up, of course).

lima · on April 2, 2018

> For logs and metrics, I'm not aware of an out of the box solution that could replay from the last, successfully transmitted line; this is something rsyslog, graphite, etc. could certainly benefit from. (Please let me know if you are aware of these kinds of buffers.)

filebeat + kafka does exactly this.

eropple · on April 1, 2018

As somebody who's been looking really hard at a project/side business that'd use Spaces (DO's object storage system), this makes me super, super nervous. To say nothing of block storage--yikes.

Can anyone speak to quality/reliability of other object storage providers that have S3-compatible (including presigned URL) APIs? S3's pricing is absolutely ridiculous by comparison, but they have the reliability argument on their side...

zedpm · on April 1, 2018

>S3's pricing is absolutely ridiculous by comparison, but they have the reliability argument on their side...

Well, unless your S3 buckets are in us-east-1. For some reason Amazon keeps having issues with S3 in that region.

Since the storage costs appear to be the same between Spaces and S3 ($0.02/GB/month) and neither charge for inbound transfer, I'm assuming your problem is with the outbound transfer pricing (S3 charges 9x what DO charges) and/or the per-request pricing. GCP's Regional Cloud Storage has the same storage pricing, even higher outbound transfer pricing, and the same request pricing. I haven't looked at any other providers, but if you want reliability, you're going to have to pay for it.

tedivm · on April 1, 2018

Their us-east-1 region is notorious for being the region with the least reliability. Unless there's a good reason not to I always recommend people default to us-west-2.

eropple · on April 1, 2018

It is external egress pricing, yeah. And CloudFront doesn't really help (it's still pricey).

As this is mostly a side project, I think I can live with being a little adventurous.

doh · on April 1, 2018

I can't speak about S3 but we're using heavily GCS (google cloud storage) and haven't experienced any problems. From time to time we can see a slowdown, but never seen an outage.

I also looked at B2 [0] once or twice. The price is great, but the traffic cost (egress from GCE) renders it unusable for us.

[0] https://www.backblaze.com/b2/cloud-storage.html

eropple · on April 1, 2018

$0.12/GB egress for GCS is even higher than S3, though. :/

boulos · on April 1, 2018

Disclosure: I work on Google Cloud.

There are actually a few options in egress land for us if cost is your primary concern.

If you're doing serving over http(s), you should probably be using Cloud CDN with your bucket [1] or put one of our partners like Cloudflare or Fastly with CDN Interconnect [2]. Both of these get you closer to $.04-$.08/GB depending on src/dest.

If not, and you don't care that we have a global backbone, you can get a more AWS-like network with our Standard Tier [3] (curiously with pricing squirreled away at [4], I'll file a bug). The packets will hop off our network in a hot potato / asap fashion, so you're not riding our backbone as much.

[1] http://cloud.google.com/cdn

[2] https://cloud.google.com/interconnect/docs/how-to/cdn-interc...

[3] https://cloud.google.com/network-tiers/

[4] https://cloud.google.com/network-tiers/pricing#standard_tier...

eropple · on April 1, 2018

I know about Standard Tier, though it had slipped my mind--thanks for pointing it out.

It's still way too expensive. And Cloud CDN isn't an appropriate tool for my use case. I really do just need a bunch of egress from a single location that isn't insanely expensive. $0.085/GB is in that insanely-expensive tier, for me.

boulos · on April 1, 2018

Understood. Out of curiosity, how much egress are you doing? (We, and other providers, need this feedback to prioritize more bulk egress solutions)

eropple · on April 1, 2018

Don't know yet! But I expect to see between 100x and 1000x on a per-megabyte basis. Not evenly distributed across objects (objects between 40MB and 150MB), but as a rough estimate.

I'd be happy to talk further via email; it's not secret, just not public.

nik736 · on April 1, 2018

Why would this matter?

Your bandwidth pricing is a joke. Yes, you got a nice network and yes you pay premiums to get transit of providers that are "hard to work with". And yes, you have dark fiber between your locations, which is costing a lot of money, but even considering those facts you are still charging at least 10x as much as your bandwidth should cost your customers.

How have you even calculated those prices? "Let's look at AWS and make it even more expensive"?

eropple · on April 1, 2018

GCS's prices are lower than AWS's for single-region egress, though. Check 'boulos's link.

Not by enough, but they are.

mattbillenstein · on April 1, 2018

The bandwidth charges are hefty if you're moving a lot of bits, but I wouldn't use anything other than S3 or GCS probably -- the other guys just don't have a track record of reliability yet.

But, you can build a poor-man's CDN -- varnish caches on DO/Linode/whatever where you get multiple terabytes of bandwidth for a small VM. So, you use the best object storage provider, but move most of the bits cheaply using Varnish + Route53 geo-dns.

eropple · on April 1, 2018

That's definitely on the table. Time-consuming, though.

mattbillenstein · on April 1, 2018

It's more to manage for sure, but something that can be built in a day or three.

avtar · on April 1, 2018

Building is arguably the easy part. Managing and supporting is a different story.

eropple · on April 1, 2018

Exactly - TCO is the core problem.

Operyl · on April 1, 2018

I mean, Digital Ocean itself has been “jokeish”. I had a VM ago down for 12 hours, it took 6 hours to even get them to confirm they had an issue on that machine. It was stupid.

acd · on April 1, 2018

There is Sheepdog distributed file system which recently went into version 1.0. Sheepdog is similar to Ceph but seem simpler to operate.

For distributed object storage: I have also used MooseFS,LizardFS distributed object storage and MooseFS,Lizard runs very steady on production work loads. Steady as setup and then no ops issues.

Also to the short list is BeeGFS, BeeGFS is created by Fraunhofer is seriously fast distributed file system.

pinewurst · on April 1, 2018

I can't speak for the quality of other object storage providers, but being in the storage business I can say that if someone is running Ceph, find another provider.

kawera · on April 1, 2018

Care to elaborate a little?

zzzcpan · on April 1, 2018

If you are relying on a single object storage provider and cannot survive downtime, data loss or simply being very slow at times you will never find a good one. Expect things to fail. Distributed systems are not that trivial for a random object storage provider to have enough expertise to run Ceph or any other open source solution at scale with no issues.

scurvy · on April 2, 2018

Or you should purchase commercial support for said storage...

Salesforce runs several large Ceph clusters, and they have a dedicated team to run it. If you can't invest in the employees, you should invest in commercial support.

Salesforce also commits a lot of updates and patches back to the Ceph community

polskibus · on April 1, 2018

What's a better open source alternative to Ceph?

AFNobody · on April 1, 2018

There isn't a better open source alternative to Ceph.

However, improperly designed/architected you will end up with serious scaling issues.

scurvy · on April 2, 2018

Testing testing testing!

It's important to actually test things.

AFNobody · on April 2, 2018

Yes it is and not just initial testing but at scale with full volume. ;)

halbritt · on April 1, 2018

The issue with Ceph isn't that it's some how deficient. It's amazing. The problem is that it's difficult to engineer correctly and hard to troubleshoot.

There is no other software, open source or otherwise that works quite as well as Ceph for providing durability and scale.

ScaleIO gets high marks for block storage performance compared to Ceph. It's not quite as durable and lacks some other features, but people seem to like it.

tatersolid · on April 2, 2018

> The problem is that it's difficult to engineer correctly and hard to troubleshoot.

These sound like... problems with Ceph.

Sane deployment, management, and troubleshooting are critical features of any distributed system.

If they’re not there the system isn’t “ready”.

halbritt · on April 2, 2018

Agreed.

Personally, I feel like Inktank was pretty close and when they were acquired by Red Hat, progress seemed to slow to a crawl.

I haven't run the last couple of versions. It could be much improved. Certainly bluestore is promising for performance.

mattbillenstein · on April 1, 2018

MooseFS

dstroot · on April 1, 2018

Who is running what?

scurvy · on April 1, 2018

[flagged]

dang · on April 2, 2018

> Lemme guess, you work at

This crosses into personal attack. That's not ok here, so please don't.

https://news.ycombinator.com/newsguidelines.html

nik736 · on April 1, 2018

A lot of companies using Ceph at scale are facing huge issues (OVH, etc.), so he is not wrong. Why take the risk of going with a solution that is known to cause issues?

Has nothing to do with EMC.

pinewurst · on April 1, 2018

Thanks for that!

I've talked to a lot of large-ish commercial Ceph customers and they seem to spend a lot of time building kludge-arounds for support. And tend to live terrified that the whole clumsy edifice will come crashing down at the cost of their jobs.

Also too, Ceph is block, object and file. Block is ok up to a point, object is dubious and file is utterly untrustworthy. At least at any kind of real scale - 3 servers in a rack aren't "scale".

Why must someone who isn't a Ceph fan (and I fail to see why storage systems are a "fan" activity) live in the evil pockets of EMC? I know people who've smoked for years and don't have any sign of lung cancer either.

scurvy · on April 2, 2018

Ceph is all K/V objects underneath. To say that "block is OK" but "object is dubious"...is silly.

Care to share what "kludge-arounds" there are for support? Red Hat offers commercial support in case you need to phone a friend.

scurvy · on April 2, 2018

OVH isn't exactly a shining example of a quality engineering organization. Simple web searches show how they have misused things and cause large outages.

Ceph is very reliable and durable. We've actually gone out of our way to try and corrupt data, but we failed every time. It always repaired the data correctly and brought things back into a good working state.

Ceph and Yahoo run very large Ceph clusters at scale, too.

true_religion · on April 1, 2018

I believe OVH uses openstack and not ceph.

nik736 · on April 1, 2018

You can use Ceph together with OpenStack. They used ceph for their cloud services but had huge problems. If I am not mistaken they completely threw out Ceph by now.

http://travaux.ovh.net/?do=details&id=20636 http://status.ovh.net/?do=details&id=14139 http://travaux.ovh.com/?do=details&id=20490 http://travaux.ovh.net/?do=details&id=26382 and so on...

They took months to figure out what's going on.

mattbillenstein · on April 1, 2018

Any idea what the underlying issues with Ceph were?

My story is a bit dated, but we went from gluster to ceph to moosefs at one startup. Gluster had odd performance problems (slow metadata operations - scatter/gather rpcs and whatnot I would guess) and it was hard to know from the logs what was going on. Ceph was very very early at this point, but part of it ran as a kernel module and the first time it oops'd, I deleted that with fire. MooseFS ran all in userspace, had good tools for observability into the state of the cluster, and the source code was simple and clean. It didn't have a good story around multi-master at that time, but I think that is improved now.

halbritt · on April 1, 2018

Ceph is extraordinarily complicated to run correctly. The docs aren't great and commercial support is pretty mediocre.

It's an amazing piece of software, but takes a great deal of engineering to get right. Most folks won't invest that much engineering into their storage.

This is why Providers like EMC and NetApp can extract 10x the cost of the raw storage from enterprises.

scurvy · on April 2, 2018

The RedHat ceph docs are great and open to everyone for free.

The RedHat commercial support has been pretty good for us. We presented them with 2 bugs, and they addressed both. One took a few weeks but one only took a few hours to get a hotfix started.

EMC storage is absolute trash post Dell merger. Pure 100% dumpsterfire. Their customers know their systems better than they do. It's pathetic.

mattbillenstein · on April 1, 2018

zzzcpan - no, metadata was in RAM which made things like directory listings and whatnot that were very slow in gluster very fast in MooseFS

nik736 · on April 1, 2018

No clue what the underlying issue was but when reading:

"We have about 200 harddisk in this cluster... 1 of the disks was broken and we removed it. For some reasons, Ceph stopped to working : 17 objectfs are missed. It should not."

I think the underlying issue is simply "Ceph" ;-)

scurvy · on April 2, 2018

User error

zzzcpan · on April 1, 2018

I can't find now, was MooseFS the one using MySQL for metadata?

daviesliu · on April 2, 2018

MooseFS have a highly optimized service to load all the metadata into memory (similar to Redis).

fapjacks · on April 1, 2018

This isn't related to block storage at all, but I was a big fan of DO until I hit a weird issue where they wanted me to prepay via PayPal to spin up more than 50 droplets at a time. I work in an organization that is spinning up many nodes at once for a short time and then destroying them soon after, for various but totally legitimate reasons. One look at our account history can demonstrate that this is almost exclusively how we use their services, so it's not like this was a weird request. And we've never missed a payment or paid late or otherwise ever given Digital Ocean any reason to think we wouldn't be good for the charges at the end of the month (especially considering we were already spending sometimes in the thousands of dollars every month). This was so off-putting. I stopped using Digital Ocean that day.

mike_j · on April 1, 2018

You could’ve asked to have been moved to a business account. Just saying.

fapjacks · on April 2, 2018

We do have a business account, if I'm reading the console correctly. I was wrong about the limit of 50. It is a limit of 100 droplets but this is an artificial limitation that they wouldn't budge on. It's clear from the (lengthy) account history that this is normal behavior for us, and I work for a company whose name everyone knows, so it's not like we're some no-name scammers. Regardless, asking customers to buy vouchers for what really is only moderate use of a service is really off-putting. It was so off-putting actually that, more than halfway through writing it, I scrapped a driver for some pretty popular software that would have enabled the use of Digital Ocean as a backend.

cube2222 · on April 2, 2018

Wasn't the issue resolved after writing to their support?

fapjacks · on April 10, 2018

Nope. They wanted money up front via PayPal.

halbritt · on April 1, 2018

Minio is cool. Unfortunately, performance is anyone's guess.

It has erasure coding as well. You could deploy on bare VM's with local storage in any cloud provider and have no dependency on network blocked storage.

With k8s 1.10 you get persistent local storage as well, as such you could probably build a fairly highly available system. Pro tip: do it in GCP as they have nice local SSDs you can attach to any instance. They're 375GB, 25k IOPS, and $.08GB, way cheaper than AWS I2 instances.

eropple · on April 1, 2018

So I have no problem with network storage. I have a cost problem (the project I'm working on is not intended to make a bunch of money and I'm trying to keep costs low so I can keep prices extremely low). What you're describing would functionally be even more expensive than just using S3 directly, if it were to be done in AWS or in GCS.

Right now the leading (uncomfortable) solution is probably DigitalOcean Spaces and a little bit of prayer.

onestone · on April 2, 2018

StorPool[0]. Inoreader recently blogged about their experience with it, and why they chose it instead of Ceph[1].

[0] https://storpool.com/ [1] http://blog.inoreader.com/2018/03/success-story-inoreader-op...

eropple · on April 2, 2018

But...I don't care about the technology. I care about the object storage available to me without caring about the technology. So what's this do for me?

squid3 · on April 1, 2018

You may find NodeChef's object storage the right solution. https://nodechef.com/s3-compatible-object-storage

eropple · on April 1, 2018

Thanks for the suggestion, but a dollar per gigabyte per month is ridiculous unless you're downloading everything in your store eleven times a month. Even S3 only costs $0.02/GB storage and $0.09/GB egress.

aedocw · on April 1, 2018

I would suggest you take a look at Wasabi for object storage. I'm just a customer, but have been using them for close to a year for off-site backup storage and it's been great.

filleokus · on April 1, 2018

Reposting from my comment from yesterday[0], how are the speeds from you location (and where's that)? It have been ridiculously slow from Northern Europe when I've tried it, like not even 1 MB/s down.

0: https://news.ycombinator.com/item?id=16726324

eropple · on April 1, 2018

Wasabi looks interesting. What's the catch? How's the outgoing bandwidth/download speed? (Large file downloads are my main concern.)

corobo · on April 1, 2018

> Wasabi’s hot cloud storage service is not designed to be used to serve up (for example) web pages at a rate where the downloaded data far exceeds the stored data or any other use case where a small amount of data is served up a large amount of times

This is the part that ruled it out for me

eropple · on April 1, 2018

Having trouble with hot objects definitely rules it out for me. Thanks!

teilo · on April 1, 2018

I have heard of a number of Ceph nightmares like this. A few years back, Logos Bible Software, who has a Ceph-based content platform (a huge library of e-books with a massive amount of meta-data), was down for a week because of a cascading Ceph cluster failure.

It really doesn't speak well of the Ceph architecture. It is highly performant, but at what cost? Failures on this scale can ruin a business.

zzzcpan · on April 1, 2018

Well, you can always partition a large cluster into many small clusters and prevent cascading failures, other issues from affecting everyone or getting too long to recover from. This is like a very basic reliability technique everyone should know.

unilynx · on April 11, 2018

They've just started sending out SLA credit notices:

-----

Hello,

On 2018-04-01 at 7:08 UTC, one of several storage clusters in our FRA1 region suffered a cascading failure. As a result, multiple redundant hosts in the storage cluster suffered an Out Of Memory (OOM) condition and crashed nearly simultaneously.

We have identified that you, or your team account, were impacted by this incident and will grant an SLA credit equal to 30% of your entire Block Storage spend for April, not just usage in FRA1. This credit will appear on your account at the end of April, and will be reflected on your April 2018 invoice.

We apologize for the incident and recognize the impact this outage had on your work and business. You can read the full detail of our public post-mortem here: http://status.digitalocean.com/incidents/8sk3mbgp6jgl

Thank you, Team DigitalOcean

pstrateman · on April 1, 2018

100% of the downtime for MomentoVPS was from ceph cluster failures....