Hacker News new | past | comments | ask | show | jobs | submit login

Can I ask an off topic/in-no way RPi related question?

For larger ceph clusters, how many disks/SSD/nvme are usually attached to a single node?

We are in the middle of transitioning from a handful of big (3x60 disk, 1.5PB total) JBOD Gluster/ZFS arrays and I’m trying to figure out how to migrate to a ceph cluster of equivalent size. It’s hard to figure out exactly what the right size/configuration should be. And I’ve been using ZFS for so long (10+ years) that thinking of not having healing zpools is a bit scary.




For production, we have two basic builds, one for block storage, which is all-flash, and one for object storage which is spinning disks plus small NVMe for metadata/Bluestore DB/WAL.

The best way to run Ceph is to build as small a server as you can get away with economically and scale that horizontally to 10s or 100s of servers, instead of trying to build a few very large vertical boxes. I have run Ceph on some 4U 72-drive SuperMicro boxes, but it was not fun trying to manage hundreds of thousands of threads on a single Linux server (not to mention NUMA issues with multiple sockets). An ideal server would be one node to one disk, but that's usually not very economical.

If you don't have access to custom ODM-type gear or open-19 and other such exotics, what's been working for me have been regular single socket 1U servers, both for block and for object.

For block, this is a very normal 1U box with 10x SFF SAS or NVMe drives, single CPU, a dual 25Gb NIC.

For spinning disk, again a 1U box, but with a deeper chassis you can fit 12x LFF and still have room for a PCI-based NVMe card, plus a dual 25Gb NIC. You can get these from SuperMicro, Quanta, HP.

Your 3x60 disk setup sounds like it might fit in 12U (assuming 3x 4U servers). With our 1U servers I believe that can be done with 15x 1U servers (1.5 PiB usable would need roughly 180x 16TB disks with EC 8+3, you'll need more with 3x replication).

Of course, if you're trying to find absolute minimum requirements that you can get away with, we'd have to know a lot more details about your workload and existing environment.

EDITING to add:

Our current production disk sizes are either 7.68 or 15.36 TB for SAS/NVMe SSDs at 1 DWPD or less, and 8 TB for spinning disk. I want to move to 16 TB drives, but haven't done so for various tech and non-tech reasons.


The async messenger has gotten the thread counts way down. May be a little easier now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: