Hacker News new | past | comments | ask | show | jobs | submit login
Diskomator – NVMe-TCP at your fingertips (github.com/poettering)
141 points by simjue 11 months ago | hide | past | favorite | 77 comments



I never really used Mac machines, but I always appreciated "target disk mode". This sounds similar, albeit over a network (which could simply be a straight-thru cable between Ethernet NICs on two machines).

Edit: Yeah. I forgot about actually saying what "target disk mode was". There's a child post that mentions it, so I'll refrain. I will say that I saw it used in imaging computers in a college computer lab setting back in the early 2000's. I definitely wished my PCs could've done it. It looked like a very handy feature. Presumably it would make fixing OS boot issues easier, as well as just harvesting files off a machine that was otherwise not operating properly due to OS issues.


For a while (2009-2014) iMacs also had 'target display mode,' where you could connect your iMac to another machine and it would act as a display. [1]

[1] https://support.apple.com/en-ca/HT204592


There's a lot of reasons I think this is going to be a badass system, but I note that the upcoming Minisforum V3 tablet has DisplayPort-IN, and I think this should just be a semi-standard capability everywhere.

With USB4 especially, it's like: these computers have crazy huge bidirectional throughput. Why why why would we not have one of the most obvious easy to do pipes, a way to let video data in?


> Why why why would we not have one of the most obvious easy to do pipes, a way to let video data in?

Because HDMI and DisplayPort use HDCP to prevent us from copying the data we're viewing for future use. I'm interested in how the tablet is going to work with HDCP as the whole point of closing the analog hole goes away if I can...

Oh, it's just a passthrough.


Diskomater appears to be an open source, platform agnostic version of this underlying technology!

In case, like me, "Target Disk Mode" is not something you're familiar with:

> People also ask What is the target disk mode? If you have two Mac computers with USB, USB-C or Thunderbolt ports, you can connect them so that one of them appears as an external hard disk on the other. This is called target disk mode. Note: If either of the computers has macOS 11 or later installed, you must connect the two computers using a Thunderbolt cable.

Sidenote: Thunderbolt cables are special USB-C cables, i.e. the wire between your Mac and the USB-C power brick.


I remember using Target Disk Mode on my iBook G4 over FireWire cables! Very useful for copying large amounts of data before gigabit Ethernet was common place.


> Sidenote: Thunderbolt cables are special USB-C cables, i.e. the wire between your Mac and the USB-C power brick.

Not exactly - that is only sort-of valid for Thunderbolt 3 and its successor/merger USB-C 4. TB1 and 2 used the Mini DisplayPort connector.

TB3/4 use the USB-C connector, but unlike USB-C, at longer lengths (per standard: 0.5m [1], but there longer passive cables as well, e.g. [2]) are the cable isn't passive or semi-passive (cables supporting USB-C PD for higher currents/voltages contain "marker" chipsets in the plugs) - it's an "active" cable with special transmission/driver circuitry to achieve tolerable signal integrity.

[1] https://en.wikipedia.org/wiki/Thunderbolt_(interface)#Thunde...

[2] https://www.amazon.de/Cable-Matters-Zertifiziertes-Thunderbo...


Pedantic (but important) footnote: it's USB4. It's not USB 4.0, it's not USB-C 4, it's just USB4. https://www.usb.org/usb4


Network boot? I think this project is genius - considering it’s solving an interesting problem and not reinventing the wheel.

I was half-expecting a Go or Rust utility but so pleasantly surprised to find a bunch of configurations, and that’s it!

Never really fully understood what systemd does. It’s basically the star of the show here. Is it an alternative to init? I honestly thought it was a user level service designed to provide a lightweight abstraction around cgroups and init. I haven’t looked into it too deep - but I have a feeling my assumption is wrong.


The service is written in C and resides at https://github.com/systemd/systemd/blob/main/src/storagetm/s...



Careful, this is prime flame-war material.

systemd is an init system plus various other tools/infrastructure for managing a system.


To be somewhat pedantic systems is an init system, but under the umbrella project of systemd, you'll find bootloaders, DNS resolvers, log forwarders, DHCP clients, network configuration tools, login session managers, the whole lot. Almost all of these components are independent of the init system, but integrate very well with it.


Target disk mode + firewire was wonderful.

If your machine was screwed up, you could plug another machine into it in target disk mode and boot off that (external) "drive" and fix your machine. Or you could easily copy files on/off your machine at firewire speeds.

In comparison, PCs in that era were always troublesome to administer, requiring disk cloning software or sometimes linux. (although they universally boot from USB now, which helps)


Target disk mode over thunderbolt 2 and 3 is also supported by the relevant mac hardware.

Target disk mode is also supported on mac laptops that are older than firewire; the original target disk mode is over scsi and is supported by the first powerbooks, the powerbook 1xx series.


you could do that in linux using usb networking and then sharing files (or block whole devices) over that connection. NFS, SMB or even iSCSI.

I think that iSCSI is involved here, as "target" reminds me of the iSCSI terminology.

I guess that (as usual) mac os just hides a lot of the details and gives you a nice gui to do that automatically.


On the Macs that supported it the feature was invoked by holding a key during power-on. On the "client" the target disk mode machine appeared as a Firewire-attached disk. It was very slick.

I'd love to have my PC laptop have the same kind of functionality. Need to troubleshoot an OS boot issue-- just plug the unit into a working PC and the disks show up as local devices. Unlock the full disk encryption and go to town. All the convenience of extracting the disk and connecting to another machine without cracking the case.


Need to troubleshoot an OS boot issue-- just plug the unit into a working PC and the disks show up as local devices.

Or you could just boot from a far more convenient USB drive with a live OS.


I've done a ton of "boot a rescue disk" because I'm usually too lazy to pull the disk. Particularly with oddball hardware, it always feels like I'm struggling with drivers or missing tools. When I do pull the disk and plug it into my daily driver machine it feels like stuff goes much more quickly because I've got all the tools I'm used to.

It's all counterfactuals anyway-- nothing like target disk mode will ever happen on the PC platform.


Apple solutions is better.


If you have another machine at hand and the wire. A pendrive fits in a wallet and works even in the middle of nowhere. Both solutions have their pro and cons.


I will say that I saw it used in imaging computers in a college computer lab setting back in the early 2000's. I definitely wished my PCs could've done it.

In that exact same setting, I've seen PCs use PXE for the same purpose.


For sure. For one-offs plugging in a machine and having it act like a removable disk would be a lot easier than orchestrating a DHCP server and TFTP server (plus boot images) to support PXE. For mass numbers of machines, though, PXE is very handy. (I haven't kept up with PXE in the UEFI era to know how its changed/kept up. Everything I did with it-- netbooting tiny Linux distros to image machines, primarily, was back in the Windows XP thru 7 days.)


>I'd like to live to see a future where people build appliances like this for various purposes, not just this specific NVMe one. For example, a nice thing to have would be an appliance whose only job is to make all local displays available via Miracast. I hope this repository is inspiration enough for an interested soul, to get this off the ground.

Very nice idea.


Interesting project. Slightly related is Ventoy [0].

Install Ventoy onto a USB disk drive and it will create a bootable partition that can mount your Ibootable images (including ISOs) onto your baremetal from the second partition it creates. In effect you can just load up a USB drive with ISOs and install onto baremetal from them. Super handy for distro hoppers and appealing if you don't want to fart around with network boot but just want to install something on a computer. I was trying to install Windows 11 and just wondered if there was an EFI thing that could just mount my USB, and Ventoy exists and works pretty well. I actually couldn't install windows without it on one system, just didn't like something about my installer media...

[0]:https://www.ventoy.net/en/index.html


I seem to recall an announcement from Western Digital(?) years ago about a line of hard drives with a direct ethernet interface. Does anyone remember the same or what might have come of it?

The market is saturated with solutions for middle-boxes that make hard drives talk to networks, but nobody seems to be directly addressing the problem of we just want storage network accessible.


> The market is saturated with solutions for middle-boxes that make hard drives talk to networks

It's my experience that these boxes try to do a hell of a lot more than just putting drives-on-network and that is why they all suck and are expensive.

The NVMe-oF fabric devices out there all seem to command a ridiculous premium when the reality is they ought to be very simple and easily cost-optimized.


There was a line of products called EtherDrive made by Coraid in the early 2000's that was basically a SATA to ATAoE bridge on the most basic devices, up to rack-mount solutions that used a Linux OS and Dell HBA to run vblade (https://github.com/OpenAoE/vblade) to expose a Linux MD array carved up using LVM2.

ATAoE ("aoe" in the Linux kernel) is nice because it is very lightweight in both terms of code to implement it (~2-3kloc, basically just stuff an ATA packet in an ethernet frame), low network overhead, and ease to setup (no IP addresses).



> Does anyone remember the same or what might have come of it?

Nothing, because it makes each disk quite costly and by the 2014 nobody wanted costly and slow HDDs.

Check the Seagate offering up there it has 1Gbit interface. You can't even run the drive at full sequential read/write speed over it. And having a two 10Gbit ports on each drive would require having two 10Gbit switches, which by 2014 were still quite costly.

EM6 solution[0,1] is neat but at least it both the quite packed both by the spec and the price but delivers a lot of IOPS and throughput.

[0] https://www.ingrasys.com/assets/files/Datasheet_ES2000_20211... [1] https://www.servethehome.com/ethernet-ssds-hands-on-with-the...


Kioxia has a line of products (EM6) that has an embedded Marvell NVMeoF/{TCP,RDMA} controller and Foxconn has a chassis that can take them and expose them just directly on the network. neat stuff.


Neat if you can stomach paying for the novelty. Let's see an open product that does the same for any old M.2 stick


SPDK + RDMA works just fine :)


Yeah i know the software exists; it's finding ways to actually build systems from it that sucks. 2.5 gigabit ethernet has absolutely killed dead all momentum for getting faster interfaces in consumer hardware and SBCs unfortunately. I don't want to build a 2-drive OSD using a comparatively gargantuan microATX motherboard.


> The market is saturated with solutions for middle-boxes that make hard drives talk to networks, but nobody seems to be directly addressing the problem of we just want storage network accessible.

Most hard drives run a serial console on some of the jumpers. You can easily run PPP or SLIP over that. QED :P


Nothing like those blazing fast 115kbps read/write speeds.


Tell me more, like a brand and model where I can do that!

BTW if you already explored that, would you know how to alter the SLC/MLC ratio by any chance?

Modern QLC drive often have a SLC area for buffering. With the right firmware tools, it should be possible to take a 4 TB QLC drive and convert it to a 1 SLC drive to get more performance.


An older reference for hacking hard disk drives: https://spritesmods.com/?art=hddhack

There were some firmware bugs on Seagate Barracuda SATA drives that could be worked-around w/ serial console. I don't remember the specifics though.

If you some search-engining on hard drive manufacturers and "serial console" you'll find indexed pages. (Presumably the really interesting stuff is buried deep down in forum posts, etc.) Just doing a couple quick searches got me some stuff.


I don't think the economics of this work because an Ethernet network is probably much more expensive than SAS and network configuration is more complex (you could make it zeroconf but that's also zero security).


> I don't think the economics of this work because an Ethernet network is probably much more expensive than SAS

Well, iSCSI has been around for ages, and because people got fed up with Fibre Channel requiring dedicated switches and transceiver components, first came FCoE that allowed using regular network transceivers and switches and then FCIP/iFCP that added regular IP routing to the mix but never saw much uptake.


> first came FCoE that allowed using regular network [stuff]

From rough memory, didn't FCoE start out that way only in theory?

With the reality that people needed to buy FCoE rated equipment, which was priced at "enterprise pricing" levels.

Pretty sure that was FCoE, and I'm not mis-remembering that from something else... :)


I believe you are talking about the WD EX2 boxes. NAS in a box.


I don't get it. Isn't this just a live CD that setups nvmt like described here? https://blogs.oracle.com/linux/post/nvme-over-tcp


I guess? But neatly packaged and pre-configured.

I mean, technically, every Linux distribution is “just a(nother) live cd” with a slightly different configuration and some custom packages thrown in. To each their own, yeah?


Now we just need a super cheap SBC with tons of PCIe lanes to chuck all those old M.2 drives into.


This has been my plan over the last few months! Finding a cheap PCIe 4 motherboard + CPU with many slots has been a challenge. Need PCIe 4 or else you’ll quickly saturate your bandwidth. Even with PCIe 4 - you still can. The fastest (affordable) network solution means using 10 GbE - which can be saturated by just one modern gen NVMe SSD :-/


> The fastest (affordable) network solution means using 10 GbE ...

That's one of those "it depends" things. If you only have a small number of computers that need connecting, and you're ok with using 2nd hand gear... then 40GBe Mellanox adapters from Ebay are pretty affordable. eg stuff like this (there's a bunch):

* https://www.ebay.com.au/itm/293393033570

* https://www.ebay.com.au/itm/175163685084

Note - I don't know those sellers at all.


There's the Asustor although it only has one lane for each SSD.


This is awesome. Would be exciting if it can be extended to support NVMe-oF as well with RDMA via RoCEv2. A SBC running something like this with at least 2x10GbE and two M.2 slots and 2 sata ports would be an absolute dream device for me.


Having the devices exposed "raw" means there's no redundancy on the device though. :(


That’s exactly what I want. I want it to be cost optimized enough to just use a bunch


I'm looking forward to Longhorn[1] taking advantage of this technology.

[1]: https://github.com/longhorn/longhorn


Does NVMe-TCP have any support on Windows?

Windows supports iSCSI clients/servers... Isn't it easier to emulate that and then you have a much wider range of possible clients?


The NVMe network server is part of systemd? Umm.. is that really relevant for an init replacement?

https://www.freedesktop.org/software/systemd/man/latest/syst...


Can't wait till there's a NVMe-QUIC.


I'm missing half of the picture. How would one mount a disk on the other end of the TCP connection?



I think a far smaller version of this could be built on top of UEFI functionality...

Ie. use UEFI to read/write the disks. UEFI to send/receive packets. UEFI to draw a splash image onto the screen.

Now, you don't need any network drivers, graphics drivers or disk/controller drivers.


I kinda wish Linux would implement support for all the UEFI functionality. It would let you build micro systems which 'just worked' with no drivers.

The downside is that UEFI drivers tend to be barebones and not high performance. But it is a nice fallback to know that you will always at least have something that works.

For example, here is the API for disk access offered by UEFI: https://uefi.org/specs/UEFI/2.10/13_Protocols_Media_Access.h...


This is already possible with kernel patches in https://github.com/osresearch/safeboot-loader . It is extremely cool, but the README there also explains why it's not necessarily a good idea.


I mean, as a proof of concept, that would be neat. As a useful service; not so much with the non-optimization of the UEFI abstraction layer.


As someone who is completely ignorant about this, what does it mean to use UEFI? I was under the impression that UEFI is just the firmware interface presented at boot time to configure stuff. So is this a way to do complex actions at boot time?


Very personal opinion, but I think the image is overcomplicated. Fedora base + systemd + sshd + application? This can surely be smushed down to being a go-krazy image. I guess then, you'd have to rewrite in go, and device support would be an issue


There's more files yes, but I don't agree that it's more complicated because those files come together in the form of some well understood components that almost anyone who's done a bit of Linux admin understands to a reasonable degree, whereas your hypothetical Go application is going to end up being a completely custom implementation of NVMe-TCP, and if there's any problems with it you're on your own.


He wrote systemd, so it might be simpler for him.


I'm aware. :)


Next thing you know someone is going to want to run a 'disklet' (probably written in Java) within the data path of these devices. Shudder!


What’s the idea with the bacteria emoji?


How does this compare with iSCSI?


Like an iops improvement of 30%+ and latency improvement of 20%+[1], ish.

[1]: https://www.reddit.com/r/Proxmox/comments/134kqy3/iscsi_and_...


There's no way to combine the NVMe drives into a larger sized unit for redundancy / failover though, so not sure what kind of future uptake this could have.


Everyone who uses NVMe-over-network-transport simply does redundancy at the client layer. The networking gear is very robust, and it is easier to optimize the "data plane" path this way (map storage queues <-> network queues) so the actual storage system does less work, which improves cost and density. That also means clients can have their own redundancy solutions that more closely match their requirements, e.g. filesystems can use block devices and implement RAID10 for e.g. virtual machine storage, while userspace applications may use them directly with Reed-Solomon(14,10) and manage the underlying multiple block devices themselves. This all effectively improves density and storage utilization even further.

NVMe-over-network (fabrics w/ RDMA, TCP, ROCEv2) is very popular for doing disaggregated storage/compute, and things like Nvidia Bluefield push the whole thing down into networking cards on the host so you don't even see the "over network" part. You have a diskless server, plug in some Bluefield cards, and it exposes a bunch of NVMe drives to the host, as if they were plugged in physically. That makes it much easier to scale compute and storage separately (and also effectively increases the capacity of the host machine since it no longer is using up bandwidth and CPU on those tasks.)


Interesting. Sounds like it'll make for higher potential scaleability, but also increase the cost (at the network layer) instead.

Probably a trade off that a lot of enterprise places would be ok with.


I’m not sure what you mean. You can add the disks to a software RAID in the worst case. Are you talking about on the host?


Yeah. It seems like directly presenting raw disks to the network means any kind of redundancy would need to be done by whatever device/host/thing is mounting the storage.

And doing that over the network (instead of over a local PCIe bus) seems like it'll have some trade-offs. :/


Would be nice to add this to Ventoy


AFAIK ventoy can boot EFI files




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: