Before anyone gets excited, know that Minio only just started supporting clusters of multiple machines, and that it is severely limited: "As with Minio in stand-alone mode, distributed Minio has a per tenant limit of minimum 4 and maximum 16 drives"[0].
They built a toy object store while blogging aggressively.
Openstack Swift or Redhat Ceph are still the only real open-source object store players, AFAIK.
Minio is designed to work in conjunction with Kubernetes and Mesos like external orchestrators. GlusterFS, Ceph and Swift has builtin orchestration. So when you measure Minio's scalability, you should measure Kubernetes instead.
Multi-tenant cloud-native architecture requires each tenant to be intentionally kept small (limited by the failure domain). This way, when you scale from few hundred tenants to millions, your complexity doesn't scale proportionately. Million'th instance of Minio is as simple as its first instance. If you are looking for a global namespace, simply use ngnix like proxy load-balancers in the front. Orchestration tools are well understood at scale unlike Ceph and Swift.
Building monolithic distributed systems has a number of challenges. They do not scale beyond a point. Failure may take down the entire site. Troubleshooting is hard. No CI/CD. Entire system has to be upgraded at once with planned downtime. Security breach exposes the whole system..
*Medium.com giving me fits and starts trying to post so doing here. CDN issues I am guessing through the GFW of China.
So glad to come across this piece; it is as if you are looking over my shoulder at my to-do list. Started messing with Minio on Docker on Raspberrypi a few days ago testing a setup to implement in my newish business in Shanghai (version 2.0). Still early days but I like what I see.
You continue to produce practical, insightful, germane pieces. Keep up the high quality work. Really appreciate it. Bought your Flocker book last month and plan to start it during Chinese New Year.
I am glad you are finding the posts useful. It is always good to know that someone has a similar to-do list as it confirms that you are at least on the same page as other people :)
Does anyone have large (PB-scale) deployments of Minio on premises? We have a need to store hundreds of terabytes of data, and we don't need a filesystem per se, so I was wondering how robust it is.
With 16 x 8TB drives, we are talking about 64TB usable space after erasure coding. Minio is designed to serve the needs of a single tenant - 64TB is usually much bigger than a single tenant's needs. However you often see 100s of thousands of such tenants in a cloud like environment.
Making a single large PB sized volume where the disks and nodes are managed like a blackbox by the filesystem is quite scary to us at Minio. Any crash means we blew up all the tenants at once. 1000s of individual Minio tenants means, we know when we add the million'th minio instance, it is not any more complex than the first instance of Minio.
Provisioning with k8s [1], mesos [2] or other external orchestration tools is better than Minio's own resource management system. When it comes to the applications, objects are just represented as URLs. Some data sitting on Amazon S3 and some on Minio makes no difference to the application.
We are also planning to introduce a namespace layer to work on top to combine these individual tenants into single namespace. This would allow you to transparently add new clusters as you scale without loosing the ability to have a single namespace.
We hang out at slack [3], please reach out if you have further questions.
Minio shards data across 16 drives. If you have more drives, you would run multiple instances for each set of 16 drives. I would generally recommend against denser storage servers. When it goes down, all the drives go offline with it.
They built a toy object store while blogging aggressively.
Openstack Swift or Redhat Ceph are still the only real open-source object store players, AFAIK.
[0]http://docs.minio.io/docs/distributed-minio-quickstart-guide