m.2 (NVMe) is so fast and cheap it's absurd. I blinked(a few years) on hardware and when I went shopping for a new disk was blown away by the price to performance.
I was equally as shocked to discover the crap-show that is the m.2 controller space. I felt for sure supermicro would have chassis you could load up with those suckers and they'd be hot-swappable and changing the face of storage. Instead I found proprietary(Intel/AMD) RAID drivers coupled to specific CPU models and expansion cards that could take a measly 4 drives(no hot-swapp).
The last 15 years of computing can be described as "extract maximum revenue" rather than what people expected of computers during the 80/90/early 2000's.
Anyway, I've been considering building some pcie/m.2 boards for a couple of markets because the price difference between a "consumer grade" board and the server/embedded board with nearly the same features but minus a couple important things for some markets is about $25 in parts, but the markup is about $500. Its even worse in some of the built for purpose devices where wrapping a bit of sheet metal around said machine adds another $2k+.
The m.2 vs U.2 market is such a shitshow at the moment. And if one of the major players decides to create a form factor that happens to be compatible with m.2? Well huge industry outcry because they might remove everyones fat margins on "enterprise" gear.
The hotswap is unfortunate, and likely a resukt of server/consumer space differentiation among all kinds of vendors (os, cpu, motherboar, and even the nvme manufacturers all have to play along). But the limit of 4 is at least a real technical limitation, each one of those addin cards usess a pcie 16x slot (either real or "dimm.2" ) and each card needs A 4x from it. You could use a mux to add more but they're already getting to the point of being able to saturate the links. Pcie 4.0 and 5.0 will give a lot of headroom for more drives on a system.
Sounds like we might need to go back to the kind of mainframe architecture that has IO offload. Split the PCIe bus into NUMA-like zones; give each zone its own (probably ARM) CPU, running its own kernel; then use "application processors" (probably x86) to command-and-control the IO zones, allocating e.g. IOMMU-subvirtualized ethernet channels to them. Control plane/data plane separation.
There's some (to me) interesting work in this area. See for example this talk[1], where they show how a RISC-V CPU with a narrow and slow PCIe link can orchestrate the direct transfer of data between two PCIe devices (say NVME and Ethernet card), saturating the x16 link between them.
sort of, the network part of that ends up being a huge bottleneck then too, with 16 drives at 5GB/s (max i've seen so far) each you've got 80GB/s you need for the network to each server. You start getting into the really expensive side of things speed wise.
Also, most CPUs you can buy have around 40-64 PCIe lanes, limiting you to 10-16 drives if you want full speed out of them (this also leaves you with no lanes for ethernet).
Which begs the question: why is server rental so expensive, but for maintaining artificial scarcity for the sake of a cloud service's bottom line?
Cloud computing price is still based around (illegally monopolized and price fixed) memory prices, which is absurd because two load-balanced NVME disks are comparably as fast as DDR3 RAM.
I was equally as shocked to discover the crap-show that is the m.2 controller space. I felt for sure supermicro would have chassis you could load up with those suckers and they'd be hot-swappable and changing the face of storage. Instead I found proprietary(Intel/AMD) RAID drivers coupled to specific CPU models and expansion cards that could take a measly 4 drives(no hot-swapp).