Hacker News new | past | comments | ask | show | jobs | submit login
Samsung Kicks Off Mass Production of 8 TB NF1 SSDs with PCIe 4 Interface (anandtech.com)
179 points by el_duderino on June 21, 2018 | hide | past | favorite | 72 comments




> NF1 SSDs enabled an undisclosed maker of servers to install 72 of such drives in a 2U rack for a 576 TB capacity

That's 12PB in a 42U rack! I wonder what sort of storage densities Google, Amazon, and Dropbox are achieving.


Note that 2U box needs something like 250gbit internal bus just for disk IO, assuming you wanted the full throughput of every drive.. there is a huge trade off in terms of processing capability and networking bandwidth putting so much disk in a single box, the rest of the hardware can't cope with it.

These things look useful only if a tiny fraction of the stored data is hot, I imagine lower density that provides more flexibility in terms of processing room on the box, and networking around the box is probably a better option in a lot of cases.

Been pondering these devices for the past 10 minutes or so, can't think of a use for them except elaborate storage servers.


Compared to HDD storage, isn't all the data hot, relatively speaking? I mean it's pushing the bottleneck to the bus, sure, but the alternative of HDD storage is slower by every meaningful metric.

(Except maybe heat? I would assume these run at a higher temp than equivalent storage space in HDD, but I've nothing to actually base that on.)

If money's no object, presumably it's not "there's no use case for them", it's "there's no reason to do anything else"?


This 16TB thing has Active read/write up to 8.7/10.6 W, Idle 4.0 W power consumption properties, and HDDs range from 4 to 10 W ( https://www.tomshardware.com/reviews/desktop-hdd.15-st4000dm... from 2013 ), so pretty much the same. And you don't have to worry about vibrations with SSDs, so you can blow a lot more air at them.


I think they're actually talking about other items being the bottleneck as opposed to the rate at which you can grab information off of disks, so if you have any meaningful fraction of your data being hot at any one point there is no advantage between either.

Even with HDDs, if you actually want to use all your throughput you're likely going to have problems with the bus (rarely needs to be optimised in 99% of cases) vs the drives (needs to be optimised in all environments).


I think AMD Epyc is a perfect match for this... 128 PCIe lanes per socket.

On a dual socket Epyc system with 36 of these, each with a 4x PCIe connection 'only' takes up 144 lanes, leaving plenty of lanes free for even 8x Dual-QSFP+ 40GbE NICs.

Of course, I'm not sure how you'd physically pack all of that into even a 2U form factor.

Usefulness... data warehouses. If the price was right, we'd probably move to that to speed up various ingestion & reporting functionality. At 288TB (assuming RAID10) capacity, it'd be a perfect fit.


Epyc is 128 lanes period. The dual-socket configuration steals half the lanes for the interconnect.

It's also PCIe 3 for now.


Ah, right. My mistake.

Still, I think it might be in a better position than Xeon for raw storage IO.


ixnay on the 40gig-ay

25/50/100 is cheaper/better on modern gear.


Fair enough, I wasn't aware of 100G adapters.


Unfortunately it's not pci 4, though.


It looks like these are PCI3 SSDs, and with a bit of poking around I can't see any PCI4 Xeons out yet either.


I was going by the title, which said it's PCIe 4.


There has been an update to the article:

Update 6/22: Samsung made an update to its statements regarding the NF1 SSDs. The drives are based on the Phoenix controller, they do not use a PCIe 4.0 interface, but rely on a more traditional PCIe Gen 3 x4 interface.


Ah, thanks for catching that!


Video editing? 6/8K raw video takes a lot of space.


LinusTechTips films in 8k IIRC and they only have one petabyte


Off topic but does anyone else find that guy incredibly annoying? I also get the impression he doesn't have a clue what he's talking about most of the time.


My impression is this is part of his persona and intentional. I don't find him annoying but I can see why people might.

On the other hand, he has a bunch of people researching the topics so that at least he knows what he's talking about.


Yes indeed!

He seems to have good clue about gaming hardware, which is the field he is targetting.


Par for the course with Youtube celebrities then.


At that resolution you're in the gigabyte-per-second range, and when you're grabbing a gigabyte at a time there's almost no reason to prefer SSD over HDD.


Why would that be? Can you provide more details? I thought SSDs are always faster than HDD


HDD are fast ~enough on sequential read/write in this scenario.


Video editing is not sequential by any means, I can maybe give that to encoding / transcoding an existing file on disk (if both the disk and the file system can ensure that a file in the 100's of GB or even a few TB's in size can be sequential) but with editing you jump back and forth between a multitude of different frames across the entire timeline of the video in the file each of those frames is a few MB's in size at these resolutions and at these file sizes it's effectively the same as randomly accessing small files on an HDD which is never exactly fast.


In the absolute worst case of frames that are on opposite ends of the disk and compressed to 20MB each, you're still spending less than 20% of your time seeking.

When you take random-access and make the data chunks big enough, it becomes sequential-access. Raw 8k footage is above this threshold.


Try editing a 4K raw video on an HDD and come back to me it’s nearly utterly impossible to do so without high double digit dropped frame count.


In general, for raw 4k, a single HDD doesn't have enough bandwidth, and a single SSD isn't big enough.

But the main advantage of the SSD, IOPS, becomes less and less important as bitrates increase.

As you go bigger and bigger, using storage arrays no matter what drive tech, hard drives start to be able to perform perfectly fine at lower price levels, while also fitting much more data.


Not at all - SSDs are much faster. One of the fastest HDDs on storagereview.com is the WD Black 6TB, with a sustained 218MB/s. While one of the faster SSDs is an Intel 750 at 2400MB/s.


So let's say you're working on ten hours of 8k footage.

If you go with 1.2TB 750s... oh you can't actually fit 20-100 of those in a single workstation.[1]

So maybe we back off to a NAS attached over a 40Gbps cable.

40 of those hard drives will be a lot cheaper than 20 of those SSDs. If we grossly overestimate seek times, despite how big the frames are, we might estimate 125MB/s per drive. That's enough to completely saturate the connection. Oh, and the array has 10 times the capacity.

[1] 20 assumes a lower framerate and generous compression. 100 assumes 60fps 12-bit at a lossless 3:1 compression. If "raw" is taken completely literally then 10 hours of footage could require 300 of them (and some of the math would have to be redone).


I don't question that HDDs are much cheaper - but even with your franken 40-drive stripping solution, you are only getting as much performance as 3 SSDs (40Gbps < 3 * 2.4GB/s).

Of course for video production you would choose slower, larger SSDs but still get an order of magnitude more performance, very handy when seeking through footage.


How much performance do you actually need, though? As long as you can load 60 * k arbitrary frames per second then you can skip around at will.

Being capped at the equivalent of 3 or 10 high-end SSDs isn't problem if there's no good way to fit more than that with any tech.

And you don't exactly need to stripe them all. You could set it up as a blob storage, maybe storing each blob twice so you can distribute reads better.


If you have some HDD, and you RAID them, processing video should be mostly sequential read/write, so it should work on HDD.

I know SSD is faster, but it depends on what the workload requires.


Ah, but you're assuming the 2U device is a server. More likely JBOF that can be connected to multiple servers.


A modern Xeon is 20 lanes of PCIv3, or 273MB/sec per drive just to get data into RAM, assuming the bus was exclusively dedicated to disk, which means at full utilization you get about 69k 4k IOPs per unit rather than the rated 500k.

Then you need to shunt that back out over the network - using the same bus it just arrived on, most likely after a trip via the CPU to add protocol headers. With a 40 Gbit NIC the per-drive bandwidth drops to absolute max 204MB/sec and available network bandwidth per-drive is only 86MB/sec (22k IOPs), meaning at full whack and in the best possible scenario, and assuming no other overheads, you'd still only ever see 4-5% of rated performance. In reality and after accounting for things like packetization (and the impedance mismatch of 4k reads vs. 1500 byte network MTU) and the work done by the CPU it's probably safe to knock that number down by another 20-40%, depending on the average IO size.

Sure they'd kick ass for reads from a small number of clients and latency would be amazing, but it'd be so much simpler, cheaper, and flexible to just have the same disk split across 4 or more boxes. What happens if the motherboard dies? That's a megaton of disk suddenly offline, etc.


Still needs 250 gbit/s total bandwidth, which doesn't come cheap. In this case it really makes much more sense to distribute the processors along side the storage and aim for data locality.


Dropbox recently posted about their new 4U / 1.4PB chassis.

https://blogs.dropbox.com/tech/2018/06/extending-magic-pocke...


> an undisclosed maker of servers

So Supermicro


Death-of-the-disk stories usually have to deal with price. But, if you drive the density north far enough, at the price per GB level I think this may no longer be an issue.

Simplistic price (bought price) vs power budget, speed, retention time/replacement time, all gets very complicated.

I haven't had a (company) laptop in three rounds with spinning disk. I replaced my @home with an SSD in the last pre-tax-claim spend cycle. I still buy commodity USB drives to be the disks for rPI small devices, but I am beginning to wonder how long that will sustain?

The floppy->semi-floppy->USBdrive story had a longish arc, but at a lower level of penetration. Once the hardshell floppy appeared, the larger units died out pretty quickly, and once USB bootable became ubiquitious, boot floppies ceased to inform my life pretty quickly, and now with RAC and iDRAC cards, I barely touch boot media either (admittedly, the RAC card has an SD and I write "images" to it in ISO format but I keep wondering how long boot media will depend on emulating a spinning CD/DVD drive.

TL;DR This feels like right size, right time. If the spec for the backplane is good, I'd like to see this baked and shipped in other vendors. Goodbye spinning metal?


Only if cost isn't a big factor vs. space/power reqs for the specific application. For example, in my home NAS I have 5x spinning disks in a RAID 5 configuration and don't see myself switching to SSDs any time soon because doing so would be a huge cost increase for very little performance gain (network speed is the limiting factor).

I think spinning metal will be around for a while yet as a low-cost alternative when performance isn't critical. Remember, tape backups are still widely used and those things are ancient by tech standards.


At some point the lines on the graph will cross. It just keeps not happening.


> I write "images" to it in ISO format but I keep wondering how long boot media will depend on emulating a spinning CD/DVD drive.

Could UEFI HTTP Boot [1][2] replace the need to boot from ISOs?

[1] https://firmwaresecurity.com/2015/05/09/new-uefi-http-boot-s...

[2] http://ipxe.org/appnote/uefihttp


Most UEFI systems nowadays can boot directly from VFAT partitions on USB drives and memory cards. Some even ExFAT. You can put your bootloader and FS drivers there.


Not for backup and long term storage. Spinning rust is far more dependable than flash when sitting on a shelf, and much less sensitive to temperature.

I know where I'm archiving my data.


How much longer do you believe this will be true? I too hold a RAID grid of 3.5" 4TB drives, which is stored offline, but every time I plug in, I think about how exposed I am to mechanical and electrical failure modes.

The smaller form factor USB drives I get for ongoing TM backup appear to have an effective life as a sealed unit of around 2-3 years max.


Until we switch away from NAND to something else (phase change etc. Every new NAND generation halves write durability.


So you use tapes?

Seriously tapes are way more dependable than disks. Less moving parts and less chance internal eletral mechanical break down. And check all the other boxes you mention.


Updated with more detail and maybe fewer typos:

I certainly considered tape and used it in the past, but there are two huge upsides to hard disks:

1. Obviously access and transfer time is much better.

2. I can access hard disks from 30+ years ago. My oldest drives use SCSI and IDE, but even SATA has been around for ages. My tapes however can only be read of the particular branch of tape drives and there are an infinitude of standards/models around so I'd have to store the tape drive along with it. I'm not sure I'd be able to restore my father old backups as the tape drive was some weird PC thing.

A few years ago I finally transferred my tapes to my FreeNAS box and breathed a sigh of relief as I don't know when I'd be able to run my DAT changer again.

So, no thanks, I'll stay with hard drives (mind you, drives _will_ die as well so you still need redundancy).

PS: I want to emphasize that I was talking about shelf life. For data sitting in a good NAS, like ZFS based FreeNAS, I have little worries as there's redundancy and weekly scrubbing (and I do have a backup of that as well). I do worry a great deal about any data that might sit on a USBKey or an SSD in the closet.


Tapes cost a lot to get going. Most tapedrives cost about 1k$ and those are the cheap ones.

And you still need a tape library, maybe with a swap robot if you want to get serious about it (or swap tapes via meat robot works too)

There is some upramp where HDDs might be cheaper.


Like to hear everyone's guesstimate on when/if the $/TB of SSD will cross over the HDD.


Never, because if it did cross over suddenly flash demand would surge by 10x and then there'd be a massive shortage, pushing the price back up. https://blog.dshr.org/2018/03/flash-vs-disk-again.html


No. If it costs less to produce, you'd just increase production until one passed the other. The demand surge might delay the inevitable for a year or two, but fabs would catch up and then the HDD would retire.

dshr misses the mark a lot of the time.


>No. If it costs less to produce,

The point is it doesn't, and not anytime soon.


It may never cost lower to produce in the environment in which both are massively available as manufacturing. But it may happen that manufacturers concentrate on flash storage (could be better profit margins, could be bigger demand from corporations or IAAS operators, could be many other things) and thus, in a cascading way make prices of disks higher for manufacturing, due to limited supply of materials, machines, trained workers for manufacturing of disks.

The only way I see the current status quo to be static by definition is if there is indeed a (above-mentioned) shortage of some raw material that would prevent the switch. Can someone comment on what exactly that shortage would be? Edit: I read the linked post (didn't see this before, sorry). TL;DR - it's just lack of enough manufacturing and issues with building more fabs.


Fabs are expensive. The article already points out NAND is reaching its limits. What those fantasies about NAND stack that could do 1024 or 2048 layers may never appears. At least in the next 5 years. Basically we have used up all of the tricks. ( For now )

Assuming a 128 Layer yield is good enough and solvable within the next 3 - 4 years. Micron is only just started shipping 96 layers and may take another 2 - 3 years for it to be mainstream. QLC only brings 33% capacity improvement over TLC at the expense of much lower write cycles and slower latency. Node scaling are also much more expensive, we don't have the node scaling with density increase while per transistor is half anymore.

Build more Fabs? Well China is pouring in $100B to brute force this problem. Nothing has come out of it just yet. And if it wasn't China, who has the incentive to build expensive Fabs, with little to no expertise in memory, patents, for a possible profits margin where its industry has a habit of cycles or long losses? I.e High Risk

TSMC ex-CEO Morris mentioned it multiple times, their company will not produce DRAM or NAND.

While the three DRAM and NAND manufacturer are well aware of what China is trying to do, and are milking the market for as long as they could.

No where in the NAND 5 years roadmap points to it being technically feasible, and economically feasible that is could be cheaper then HDD in the 5 years AFTER the current 5 years roadmap. I.e Nothing is showing it could happen in the next 10 years. While the HDD camp, Western Digital has the roadmap and tech, that they can reach 40 to 80TB per HDD in the next 8 years.

So I have no idea what I am getting downvoted. Not to mention one of the poster is correct about how cheaper NAND and higher demand will means prices go up higher.

That is like those people who keeps talking about OLED replacing LCD, nothing in the roadmap, of all possibilities show OLED will even be the same price as LCD within the next 5 years. The most optimistic forecast has it being double the price of LCD within 5 years. That is including the printable OLED that Sharp is investing in.


Thanks. This is very informative.


I don't know about $/TB, but SSDs are already cheaper for 128GB and below (i.e. for most non-HN people's personal needs)


What’s the smallest HD in production these days? 500GB?


It's not a very rigorous answer, but I sorted HDDs on Newegg by price ascending, and the cheapest one was 1TB @ 44USD. There were several 128 and 120GB SSDs below that.


You can buy used multiples of 6TB SAS on ebay. I've seen 4x6TB/£450.

If I were to build a home NAS - I'd probably use these. It's harder, by the way - to find proper cheaper SATA drives, on the other hand.


There must be something wrong with them then. Everyone knows data can be recovered and or decrypted and sane practice is to physically destroy them when they fail/obsolete.


I've seen 500GB on Amazon, but you can get a 1TB for only a few bucks more.


Notably the specs don't support OPAL encryption like many of Samsung's other products. That's a bit disappointing.


How are people using NVMe drives in servers? Is soft raid the only option?


NVMe RAID now exists: https://www.broadcom.com/products/storage/raid-controllers/m... But I suspect most people are using soft RAID or no RAID.


I use soft RAID 0. The speeds are remarkable.


Is this suitable for NVM usage?


Top option in the new Mac Pro?


This SSD will probably be out of date when Apple decides to care enough about the Mac to ship the Mac Pro. Maybe 2023?


Maybe, but by then it will be 2023 and we'll be on PCIE 6.


Apple makes their own SSDs; presumably they could make something similar if they want to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: