> Although Blb is not being actively developed as a production system, its authors plan to continue improving the system in their spare time as an educational project.
Looks broadly similar to the bottom layer of the system I work on, except that blb only claims that it "should" scale to very high levels whereas the one I work on has already been running at many-petabyte scale for years.
Not too surprisingly, a lot of things that seem "impossible" at smaller scale become every-day things for a large enough system. For example, you will get some kinds of inconsistencies that necessitate various forms of active GC or scrubbing. You will have hot spots, which you need to explicitly deal with instead of relying on statistical distribution guarantees. You will have to migrate whole racks' worth of data at once as equipment (not just disks and hosts but also network switches and power infra) get upgraded. And of course you'll have to monitor the hell out of it so you can fix these problems as they occur instead of having them multiply until your system is irretrievably broken. Don't take claims of super-duper scalability (from blb or Minio) on faith. Look for these "extras" as evidence that the system actually has been run at scale. BTW they don't seem to be there in the blb source.
What does seem to be there is a reference buried in the deployment docs to a "master" component, not mentioned in the architectural overview. It seems to be responsible for assigning partitions of the blob space to curators and directing clients to the right one. Seems like a bottleneck but OK, let's take a look anyway.
// Since we don't persist the address and last heartbeat time, when a master
// failover happens, the new leader cannot service requests until it hears
// heartbeat from the curators. See PL-1102.
(from master.go lines 35-37)
Hm. That seems like a pretty big disruption, even if it's rare. Also, what kinds of heartbeats are these? I think it's generally a bad idea for systems like this to implement their own liveness checking. That's a hard problem, there are tried and true specialist-written systems for doing it, other systems that do it themselves are almost certainly drifting away from their own core competency. This comment is getting long enough so I won't do a full analysis of the blb heartbeat system, but I invite others to look at it with an eye toward how much load it imposes on masters in a large cluster, how reliable failure detection is, and what things should be done (but aren't) when heartbeats fail.
It looks like a pretty good start to a distributed blob store. The basic architectural principles are sound, the code looks pretty clean and well commented, etc. OTOH, seems a bit light on tests, and the lower-level implementation details suggest that in its current form it might not handle even a hundred-node cluster all that well. Caveat emptor.
It's a system within Facebook called Warm Storage. There have been some public presentations about it, so I'm comfortable mentioning the name, but unfortunately I can't provide many other details about architecture or scale. I'll just warn people that the public information on it is way out of date. Most of it seems to be from 2014, and what it describes is really a separate system from what we have now.
It looks like it isn't as user friendly as Minio and it doesn't provide an S3-compatible API like Minio but it looks to be more scalable and less opinionated than Minio (which for example forces the authors' replication preferences on you by hard coding them).
None of their code repos seem to have been updated for years. Good riddance, too. Object storage is a fine thing, but their implementation was laughably bad.
So wait this is abandoned?