Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hosts online service seems to think deserving of medal for discovering that S3 buckets from a cloud provider are crap and cost a fortune.

The heading in this space makes your think they're running custom FPGAs such as with Gmail, not just running on metal... As for drive failures, welcome to storage at scale. Build your solution so it's a weekly task to replace 10disks at a time not critical at 2am when a single disk dies...

Storing/Accessing tonnes of <4kB files is difficult, but other providers are doing this on their own metal with CEPH at the PB scale.

I love ZFS, it's great with per-disk redundancy but CEPH is really the only game in town for inter-rack/DC resilience which I would hope my email provider has.



Ceph is most certainly not the only game in town. It's good and stuff, but it's just tech. We're using protocol level replication for each of our data stores.


No, let's be honest. CEPH is the only solution for data management at this scale (sub to few PB). The solution which is independent of application or workload. The market share, fact IBM is moving people off other projects internally for this, and the massive backing shows this.

Yes you can have all or a bunch of these features like failure domains via other routes/products but none have all of the stuff together in one place like CEPH.

There's a reason people call it the "Linux of storage". The only alternatives are manage this at a higher level in your stack (reinventing the wheel) or buying PB level solutions from corporate which is like saying I'm buying Oracle and MS over Linux.

Protocol replication means you've reimplemented something which is storage related elsewhere in your stack. It's not incorrect to do so, but there exist better solutions and alternatives now.


I mean, I'm happy to have this argument. CEPH is content agnostic and that's fantastic most of the time. Cyrus replication is data aware, so it's not just replicating the data, it's doing integrity checking and data model consistency handling.

Most of all, it's doing split brain recovery; which - if we wanted CP rather than AP then we wouldn't need, but that wasn't the original design.

If I was redoing this from scratch, I'd maybe do Ceph or similar and update Cyrus to work well with it, but that would be a big change from the current design.

Anyway, I'm happy to stipulate that Ceph is great tech, without going and telling other people that it's the only choice.


Do you honestly think CEPH isn't doing data consistency handling? I'll pay for your ticket to cephalocon if you'll speak to that effect(!)

Split brain stuff only happens when you're splitting a single threaded task and put it back together. MDS in CEPH has this problem but that's so far into the weeds here as to be off topic.

Again you're implementing something storage not in storage and taking any storage. Fine if you want to do it that way, but talk about _that_ not hecking ZFS being mah saviour. (Btw daily driving and love that too but an email provider _relying_ on it should raise eye brows)...


I do believe we are talking past each other here. Of course ceph does data consistency, but it sure doesn't assert that a modseq is monotonically increasing or that an mailbox/uidvalidity/uid triple doesn't change digest, because it's not data-model aware.

sigh




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: