Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

m.2 (NVMe) is so fast and cheap it's absurd. I blinked(a few years) on hardware and when I went shopping for a new disk was blown away by the price to performance.

I was equally as shocked to discover the crap-show that is the m.2 controller space. I felt for sure supermicro would have chassis you could load up with those suckers and they'd be hot-swappable and changing the face of storage. Instead I found proprietary(Intel/AMD) RAID drivers coupled to specific CPU models and expansion cards that could take a measly 4 drives(no hot-swapp).



The last 15 years of computing can be described as "extract maximum revenue" rather than what people expected of computers during the 80/90/early 2000's.

Anyway, I've been considering building some pcie/m.2 boards for a couple of markets because the price difference between a "consumer grade" board and the server/embedded board with nearly the same features but minus a couple important things for some markets is about $25 in parts, but the markup is about $500. Its even worse in some of the built for purpose devices where wrapping a bit of sheet metal around said machine adds another $2k+.

The m.2 vs U.2 market is such a shitshow at the moment. And if one of the major players decides to create a form factor that happens to be compatible with m.2? Well huge industry outcry because they might remove everyones fat margins on "enterprise" gear.


The hotswap is unfortunate, and likely a resukt of server/consumer space differentiation among all kinds of vendors (os, cpu, motherboar, and even the nvme manufacturers all have to play along). But the limit of 4 is at least a real technical limitation, each one of those addin cards usess a pcie 16x slot (either real or "dimm.2" ) and each card needs A 4x from it. You could use a mux to add more but they're already getting to the point of being able to saturate the links. Pcie 4.0 and 5.0 will give a lot of headroom for more drives on a system.


Sounds like we might need to go back to the kind of mainframe architecture that has IO offload. Split the PCIe bus into NUMA-like zones; give each zone its own (probably ARM) CPU, running its own kernel; then use "application processors" (probably x86) to command-and-control the IO zones, allocating e.g. IOMMU-subvirtualized ethernet channels to them. Control plane/data plane separation.


There's some (to me) interesting work in this area. See for example this talk[1], where they show how a RISC-V CPU with a narrow and slow PCIe link can orchestrate the direct transfer of data between two PCIe devices (say NVME and Ethernet card), saturating the x16 link between them.

[1]: https://www.youtube.com/watch?v=LDOlqgUZtHE (Accelerating Computational Storage Over NVMe with RISC V)


A bunch of less powerful servers with expansion cards over a network should be much easier to manage and horizontally scalable


sort of, the network part of that ends up being a huge bottleneck then too, with 16 drives at 5GB/s (max i've seen so far) each you've got 80GB/s you need for the network to each server. You start getting into the really expensive side of things speed wise.


Also, most CPUs you can buy have around 40-64 PCIe lanes, limiting you to 10-16 drives if you want full speed out of them (this also leaves you with no lanes for ethernet).


EPYC looking at you with 128 lanes


EDSFF is kind of a hot-swap flavor of M.2 but it has taken years to get to market and has bifurcated into three incompatible variants.

For example, here's what you're looking for: https://www.supermicro.com/en/products/system/1U/1029/SSG-10...


128 TB in 1U of space. Damn.


Which begs the question: why is server rental so expensive, but for maintaining artificial scarcity for the sake of a cloud service's bottom line?

Cloud computing price is still based around (illegally monopolized and price fixed) memory prices, which is absurd because two load-balanced NVME disks are comparably as fast as DDR3 RAM.


The throughout is fast but nvme drives have much more latency than RAM.


We're talking about remote internet-transversing applications. They all have network latency far in excess of any disk technology.


Can you write and rewrite them over and over again thousands of times a second though without damaging them?


Samsung warranties them for 5 years.

DDR3 was 2007, DDR4 was 2014, DDR5 is supposed to be this year.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: