Hacker News new | past | comments | ask | show | jobs | submit login
Seagate just reinvented the disk interface using Ethernet (speakingofclouds.com)
235 points by slyall on Oct 25, 2013 | hide | past | favorite | 117 comments



I really like the "its just a server that takes a 4k key and stores and retrieves a 1M value" approach. I'm not so keen on the physical drive "repurposing" the standard pinout of existing hardware unless they are prepared to gracefully fall back to the old block device standard if it gets plugged into a "muggle" device.

This has real promise so long as it stays as radically open as they are claiming it will be. When I can grab an old scrub machine, put a minimal debian on it and apt-get seagate-drive-emulator and turn whatever junk drives I've got laying around into instant network storage (without buying magic seagate hardware), I'm sold (and then might think about buying said hardware).


Key value stores are useful, and they are especially useful in this form factor. On the other hand, you now have a very large black box that you have to somehow navigate in order to create a workable system. Given that this is likely an arm core running linux on the inside, I would have considered a slightly more open approach to be 'Here's a working KV store using backing db X and here's how to reflash it if it doesn't quite work for you'.


I think the idea is that if you want to do that, you would use OpenStack, and your application logic must be pluggable so that it supports this protocol, OpenStack, S3, or any other KV store you can get a library for.


I hope it will support IPv6. The article mentions DHCP and has an example address of 1.2.3.4, but IPv4 seems like a poor choice for a new LAN protocol in 2013. Not everyone has IPv6 internet connectivity but we do all have IPv6 LAN.

Apple has been using IPv6 for local network services for years now, like file sharing and Time Capsule backups, and it works great.


Well on an internal network ipv4 is more than enough. With subnets you can put a LOT of hosts on your private network without any problems.


IPv4 is not just about larger address space. It has better autoconfiguration, better multicast, better MTU, and it has a constant size header to make routing easier. Unlike IPv4, it works well with no centralized server assigning IP addresses - between link-local addresses and multicast, you get everything you need to communicate locally, automatically. It is a good protocol and we should use it because it is good, not just because address space exhaustion forces us to.


Thanks a lot for the reply. I'm actually really interested in learning more about IPv6 now for all those reasons you mentioned.


IPv6 has gigantic variable-length headers. That is inefficient, and too hard to implement in FPGA/silicon for a niche exploratory project.


The technical details:

https://developers.seagate.com/display/KV/Kinetic+Open+Stora...

The important, actual TLDR: "Kinetic Open Storage is a drive architecture in which the drive is a key/value server with Ethernet connectivity."



Simulator done Java, with additional support for Python & Erlang! :)


This seems like a reinvention of Coraid's ATAoE, which has the added benefit of already being in the mainline kernel, good server/target support (vblade), hardware products shipping now, a lack of IP/TCP overhead, and a dead-simple protocol.

http://aoetools.sourceforge.net/


Basically came here to say the same thing, I like Geoff but this isn't "new" in that sense. The "newness" here is that Seagate just put it into their base board controller. Had they been a bit smarter about it they would have put in two Ethernet PHYs and then you could dual port the drive, much like the old DSSI drives from DEC.

Routing is also a non-issue since a single drive on the network is about as useful as a single drummer in a marching band, basically you're going to need at least three to make something with a bit of reliability, and more if you want efficient reliability. So between your actual storage 'processor' and the storage 'elements' you drop in a cheap bit of Broadcom silicon to make a 48 port GbE switch and voila, your much more reliable than SATA and much cheaper than FC.

I'm sure tho that the folks at Google are all over this. :-)


Note that while the lack of TCP/IP overhead may be helpful, it also means you don't get any routeing - often one of the pains of using FCoE.


It is entirely possible to encapsulate ethernet frames into whatever protocol you like, should routing be your desire.

The 99% use case of storage access requires no such frills, though.


Also, it means you're at the mercy of your network if it starts dropping or duplicating packets. Which is potentially very bad when each of those packets is an ATA command…


ATAoE was- to my knowledge- never integrated into a drive controller.


I don't understand what this means; can you elaborate?

Since the attachment for an AOE drive is Ethernet, the drive controller is just an Ethernet NIC...


There are no drives that speak AOE, so AOE needs a processor to do the encapsulation, minimal as it is.


I'm waiting for stereo components that connect to each other via an Ethernet cable and a hub.

Imagine a CD player, turntable, receiver, preamp, etc., that all have only two connectors: power, and Ethernet. You wouldn't have problems anymore with running out of connections on the back of your receiver. That incredible rats nest of disparate wires and cables would be gone. No more RCA cables, coax cables, HDMI, optical cables, composite video, supervideo, component video, BNC, various adapters, etc.

No more fumbling around the back trying to figure out which socket to plug the RCA cables into, which is input, which is output, etc.


problem with audio is delay. even all analog you already had delays (granted some of the analogs are acumulators so...) but all the formats you see in audio are just lots of people, all with differents ideas of trade offs for delay and easy of use. you are on the far right... so do not invent a new one and use what we already have there, which is optical i think.


I've talked with an electrical engineer who works for QSC on the part of their business that deals with large, coordinated installations. They run huge, theme park-sized systems over COTS networking gear (read: Ethernet) with audio synced within hundreds of nanoseconds at locations hundreds of meters apart. It's doable, but you're not going to see it in normal consumer hardware.


I've noticed support for this on a few ethernet capable MCUs lately:

http://en.wikipedia.org/wiki/Precision_Time_Protocol


This. I once tried to repurpose an old Mac as a karaoke machine. Figured I could use the mac's built-in mic input. Absolutely unusable because of the ~50ms delay. Ended up having to buy a little analog mixer to make it work.


Modern (wired) LANs have < 1ms round-trip latency. Is that still too slow?


Yes that is really slow


Actually modern switches have 400ns latency. Is that too slow? Hardware PTP support in your switches can get you clocks synced to close to a nanosecond accuracy, surely that's good enough?


"Über alles" will be primarily associated with Hitler's anthem of the Third Reich by most native speakers of German who know some history. Not a good choice for a title.


Since the article was written in English, for an English speaking audience, who consider generally "über alles" to not be particularly offensive anymore (it has come to mean something like "all conquering"). I don't think it was a poor choice of title at all, especially since it sums up the tone of his article.

It's quite unfortunate that the far-right in Germany has since appropriated the phrase, but it's been appropriated differently by English-speakers. Censoring people based on a usage that is foreign to them is a little harsh.


> it's been appropriated differently by English-speakers

I'm an English speaker from the American midwest and I still think of it as an (I always assumed ironic) reference to Nazi supremacism when I see it. I'm not offended by it but it always struck me as a strange joke. "This thing is so great it's Nazi great!" Fantastic.

It's interesting to me that that's not the intent. It's a reference to the phrase "Deutschland Over Everything Else," which is either a Nazi thing or a merely German nationalist thing, depending on the context and intent. I can see why a German person might want to "take it back" and make it not mean something fascist, but why would anyone outside of that context even bother?

> Censoring people

Konstruktor was pointing out a flaw, rightly or wrongly, but not censoring. He doesn't have that power here.


Do Germans really associate that phrase mostly with Hitler? I find it quite surprising because despite not being German (nor German-speaking) I know that this phrase comes from the first verse of the Deutschlandlied.


While the origins of the phrase are not tied to the Nazis, it has become associated with them. It was originally about the unification of Germany, but it is viewed quite differently now and certainly associated with the far right or the extreme right.

The verse of the Deutschlandlied that contains it is also no longer used, only the third verse is the national hymn.


Unless the author wanted to imply that Seagate should seek to eradicate any other kind of harddisk interface with zealot-like madness, he shouldn't have used that phrase. Really, the first second I opened the website I noticed this poor choice of words. And yes, I am German.


Yes a very poor choice of phrase, esp when he could've used anything else.

He's not alone ; this issue came up just yesterday: http://blogs.telegraph.co.uk/culture/davidbolt/100071146/how...


That's interesting. The word 'Sonderkommando' has no negative connotation at all in German. Is it something specific to how the British look at WW2? Or is it just the usually outrage between parties?


Yes it has. Being German, please google it.

The Sonderkommando was a special group of prisoners at Auschwitz who had to prepare and cleanup the extermination of millions of prisoners. Their job was to carry the dead from the gas chambers into the ovens. They were recycled every 3 months or so by a fresh Sonderkommando, whose first job was to murder their predecessors. They also made history by starting the only counter-attack at the camp, blowing up one of the gas chambers.

You can write "Deutschland ueber alles" all day, that's a minor issue. But using Sonderkommando in the wrong context can cause quite a stir.


yes, I thought it would be some storage reinvented by Aryans


Every American will think it's California...

Start typing Uber A... into google and what do you get?


"Uber" is a misspelling. The word is über, or ueber.


Network storage macht frei.


As a counterpoint: A slightly less gushing article with some good comments (yes, even on El Reg) http://www.theregister.co.uk/2013/10/22/seagate_letting_apps...

Comments along the lines of "Backups? Snapshots? RAID? How they handling this then?"


How are the handling this today? You can treat an existing HD as a key-value store where the keys is the location on disk and the value is a sector of binary data. Conceptually there's no difference.

The answer is: If you need those capabilities to offer up a traditional file system, you do as you do today: you layer it on top.

But many systems don't, because they already re-implement reliability measures on top of hard drives, as we want systems that are reliably available in the case of server failure too.

E.g. consider something like Sheepdog: https://github.com/sheepdog/sheepdog Sheepdog is a cluster block device solution with automatic rebalancing and snapshots. It implements this on top of normal filesystems by storing "objects" on any of a number of servers, and uses that abstraction to provide all the services. Currently sheepdog requires the sheep daemon to run on a set of servers that can mount a file system on the disks each server is meant to use. With this system, you could possibly dispense with the filesystem, and have the sheep daemons talk directly to a number of disks that are not directly attached.

For sheepdog RAID is not really recommended, as sheepdog implements redundancy itself (and you can specify the desired number of copies of each "block device" ), and it also provides snapshots, copy on write, extensive caching and support incremental snapshot based backups of the entire cluster in one go.

So in other words, there are applications that can make very good use of this type of arrangement without any support for raid etc. at the disk level. And for applications that can't, a key value store can trivially emulate a block device - after all sheepdog emulates a block device on top of object storage on top of block devices...

You could also potentially reduce the amount of rebalancing needed in the case of failures, by having sheep daemons take over the disks of servers that die if the disks are still online and reachable.

The biggest challenge is going to be networking costs - as I mentioned elsewhere, SSDs are already hampered by 6Gbps in SATA III, and 10GE switches are ludicrously expensive still.


Bitrot? Error recovery? SMART data? ...?


Thanks for this - surprised by the Basho/Riak connection.


I wish SD cards would implement a key-value storage interface natively. It would instantly remove the need to implement a filesystem in many embedded systems eg. music players: all they need is access to keys (song filenames) and values (blob of ogg/mp3 data).


As some doing a bit of embedded system work these days, I was wondering why the MCU manufacturers don't offer a key value store(Even small ones would do) for configuration purposes.

The most famous ways of managing configuration is serializing a structure on EEPROM/Flash, or writing a string with lengths of the strings as delimiters.

Even if you assume, its for saving space etc. The way I see you will inevitably use space while you write the code to serialize and de serialize the configuration data.


Well, a lot of microchips tell you to write a value to a specific location in memory because that location in memory is physically wired to the hardware you're controlling.

So for example the output pins of a microcontroller are a memory location wired (through a buffer and some control circuitry) to the output pins. The PWM circuit is a counter and a comparator, where the comparator's inputs are the counter and a memory location.

You could write a key/value to memory location mapping layer, of course, but that's basically what vendor libraries, device drivers and operating systems already provide.


These hardware KV interfaces are still fairly low-level. Fixed-size values, etc. So you'll still need to have some abstraction layer that handles that. You might not call it a filesystem but I bet it'll look a lot like one.


SD cards implement a linear map of blocks. We typically introduce a hierarchical key-value store on top of that- a filing system where we file values by certain keys.

What is your ask? Get rid of hierarchy and use only a single flat directory on the SD card? Plan9 was close to the kind of vision you describe- configuration and state for applications lived live in the file system.


Database as a file system. In some ways it actually makes an odd sort of sense...

    SELECT * FROM sdc
    WHERE Type='mp3';
I could see uses for something like that. You could even treat it like a traditional file system for fallback purposes, if one of the tags was a 'directory' tag.

Also, it would make sense in cases where you have... [whatever the equivalent for NUMA is for disks. NUDA? Things like hard drives with a limited flash cache.] Store the indexes on the flash or in RAM (periodically backed up to the disk, of course). Biggest issue would be wear on the flash, though.


Classical Forth systems worked that way. Being Forth, they of course went for minimalism here. Keys were unsigned ints (actually, ints interpreted as unsigned) and all values were 1024 bytes (see http://c2.com/cgi/wiki?ForthBlocks)

Programming that way was fun, but I wouldn't want to use it on a system with megabytes of RAM. Embedded, it would be fun to implement what you describe on top of that, though.


That would be nice and it's doable. I like eMMC for microcontroller apps since it implements wear leveling and bad block management internally. Adding a simple key-value mapping could probably be added without too much effort.

eMMC isn't meant to be removable, though.


I can not stand to not post link to one old (2010-04-01) thedailywtf article about native key-value storage on HDD:

http://thedailywtf.com/Articles/Announcing-APDB-The-Worlds-F...


this sounds like a hybrid of ataoe[1] and 9p[2], an interesting idea for a protocol

[1]https://en.wikipedia.org/wiki/ATAoE

[2]https://en.wikipedia.org/wiki/9P


I think this is an incredibly interesting approach, and I hope Seagate open it up a little more. If we could run some computation on the drive, that could be incredibly powerful.

I can imagine that once these are SSD drives, paired with reasonably powerful (likely ARM) chips, that we'll have massively parallel storage architectures (GPU-like architectures for storage). We'll have massive aggregate CPU <-> disk bandwidth, while SSD + ARM should be very low power. We could do a raw search over all data in the time it takes to scan the flash on the local CPU, and only have to ship the relevant data over (slower) Ethernet for post-processing.

I'd love to get my hands on a dev-kit :-)


hard drives are already run by arm controllers http://hackaday.com/2013/08/02/sprite_tm-ohm2013-talk-hackin...

so your idea might happen sooner than expected


Also SSDs implement an internal filesystem optimized for flash that emulates a block device to the outside. For this Kinetic Store, the controller would be quite likely even simpler.


There would still be a need for ssd specific wear leveling and trim logic in the controller though


Trim is not needed if the disc directly gets delete commands, while wear leveling is actually a problem of spinning-optimized filesystems and can be solved easier if the external representation is a key-value store of immutable objects.


Yes... given they use the same connector on the HDD, I'm wondering if they actually need to change the hardware at all!

The sprite breakdown identifies one core as managing the SAS commands, via DMA. Now how hard would it be to actually manage ethernet similarly, via a ARM core? If that's what they've actually done, that's really damn cool.


Are you munching some number crunching?


Seems like an odd invention given the industry is moving to storage technologies with sub-microsecond latencies, which is at least an order of magnitude better than 10ge is usually capable of. Still at least 'object store' style operations are much richer, so the need is avoided to make many round trips to the disk to resolve the location of a database record.

Hmm, which raises the question: how much RAM should a hard disk have? In a regular architecture, that database lookup could be meaningfully cached (and you could design and provision exactly to ensure your entire set is cached). Opaque K/V "disk" seems less appealing from this angle


Kinetic looks like it's for cold storage. There's already a problem where the server to host cold disks (e.g. a Backblaze pod) costs more than the disks themselves. Having the disks speak Ethernet may reduce cost.


If this means 10gbps ethernet switches finally comes down in price, awesome...

Otherwise this will be hampered by the fact that the 6Gbps of SATA III is already too slow to take maximum advantage of many SSD devices (hence OCZ experiments with effectively extending PCIe over cables to the devices.


These are 4 TB units of 5900 RPM spinning rust.


"The Seagate Kinetic Open Storage platform eliminates the storage server tier of traditional data center architectures by enabling applications to speak directly to the storage device, thereby reducing expenses associated with the acquisition, deployment, and support of hyperscale storage infrastructures."

First of all: Hyperscale? I'm not a retarded non-technical manager or MBO, so I just stopped listening to your entire pitch. Second: You're still selling storage infrastructure, and I still have to support it. The expense just has a different name now.

"Companies can realize additional cost savings while maximizing storage density through reduced power and cooling costs, and receiving potentially dramatic savings in cloud data center build outs."

How does reducing my power and cooling costs maximize my storage density? Oh, by getting me to spend more money on your product instead of power and cooling. Nice try, buddy; give me the cost comparison or stfu.

Their whole pitch here is "throw away your key/value servers and use our key/value server instead". I wonder which will be more expensive: something I throw together with commodity PCs, or a SAN developed by Seagate.


I think you're misreading this. Nobody will just bolt a few thousand of these drives to a switch and call it done. That's not practical.

On the other hand it opens up a massive opportunity for people to create their own storage fabric using these drives as building blocks. That means instead of having to hit up EMC for a big drive array, you will eventually be able to get an open-source implementation of same if you want, or one built on open standards so you're not locked into a particular vendor.

For companies like Facebook, Google, Apple or Yahoo that are storing petabytes of information, a drive subsystem like this is surely a dream for their engineers. Now instead of having to attach the drives to servers that do little more than wrap S-ATA or SAS into Ethernet for merging into a larger storage cluster, the drive does that all by itself.

Plus, imagine how Backblaze might be able to re-engineer their pod (http://blog.backblaze.com/2013/02/20/180tb-of-good-vibration...) to use this.


I think you're missing the value prop of the BigStorageCos like EMC and NetApp.

If all they provided was a disk shelf and loose cluster coupling, nobody would buy that.

The big win (for those companies which consider it a win) is in data management and namespacing. Storage virtualization, if you will. I can have a hundred disks spinning behind a storage controller which lumps them together into three storage volumes. I can serve file data off two of them (say, NFS on one and CIFS on the other) and stick some LUNs on the third and serve blocks (iSCSI). I can dedupe, snapshot, and migrate these data at will.

FWIW. Not belittling the Seagate announcement. Just clarifying why the article is correct in suggesting that EMC and NetApp aren't particularly worried about the announcement.


Obviously EMC, NetApp and others in that space provide value, but what they do with existing drive technology is simply out of reach of most startups. The barrier to entry is way too high.

With this new technology it opens the door for innovation in this space. Using commodity components you can build out new EMC-type systems with a different focus. The amount of hardware engineering you have to do is minimized, and more of it can be done strictly in software. Lower cost, more rapid iteration, all that.

I'm not saying this will happen tomorrow, but in ten years the storage market could be turned inside out by this approach.


I am imagining it. The only effectively useful feature of this is actually adding an extra server layer, not removing one.

--

Case 1: Startup X makes a webapp cluster that looks up user information and returns results. It calls a library, which looks up a hash key to query a disk, and returns data.

Problem 1: Lack of load balancing. If there are three disks, and user FRANK is on disk two, and user FRANK's data is getting queried 50x more than the other users, that second disk is toast performance-wise.

Problem 2: No redundancy plus short lifespan of disk means when the disks die the user data goes too.

--

Case 2: Big Company Y creates a storage application layer to intelligently do things with the data. They have a small cluster of machines with apps that take queries and do things with the data, and manage the data using key/value pairs on disks attached to a private storage switch.

Problem 1: Dependent on ethernet (and its overhead, and latency) for each query doesn't perform as fast as other disk interconnects; have to use hacks to increase performance. Network management now critical component of your storage functionality.

Problem 2: Because the Virtual Memory Manager is no longer managing a filesystem cache, all key/value fields must be cached by the application, so you're re-implementing a VMM layer in your storage application. (Because nobody is stupid enough to not cache random disk queries)

Problem 3: Relational queries become almost completely useless. Performance drags due to all the individual queries, and you end up building a new cache layer just so your database can speed up searches, or at worst case end up with an index-only cluster of these disks.

--

As you can tell by reading Backblaze's site, there are lots of different uses for storage and different requirements for each. But one thing that's pretty widely acknowledged is it's more efficient to have a really long single piece of storage versus lots of very short pieces. I imagine Backblaze will look at this and go: Why don't we just make our own?


I wonder about performance - will this new storage protocol be at least as performant as current standards (ATA, SCSI) ? We need better performing drives, didn't the datacenter sort of already took care of itself?


That's an interesting question, and I think the answer isn't immediately obvious.

One important thing is that the disk is doing more work now -- you offload a bunch of what the filesystem has traditionally had to do onto the disk itself. That should mean less traffic, and lower latency. Maybe not higher throughput, though.

The interface is 2x1gigabit, so that's obviously slower than a 3-6 gigabit SAS or Sata interface. But maybe the offloaded work will be worth it? Especially if you are doing lots of "small" IO operations, the potential for lower latency might be a win.

It's a cost reduction at the end of the day, not a huge performance bonus. I am very interested to get my hands one and see how it plays out.


I don't think we need better performing hard disks. Everyone who cares about performance should have moved to flash already. Kinetic looks like it was designed by "disk people", not "flash people".


I don't understand why this is branded as a Ethernet protocol when it's a IP protocol


It communicates over ethernet, which is what matters. So it doesn't matter that much, and people understand what it means -- in the same way it doesn't matter you just said "IP protocol", which expands to "internet protocol protocol".


But IP runs over more than just ethernet or is it limited only run on ethernet?


No, for example you could also run IP on DOCSIS, which is typically employed by cable modems. Or you could run it on an 802.15.4 stack, using 6LoWPAN (in the RF world, and lately also narrowband PLC such as G3).

Remember that IP is layer 3 in the OSI model, and you could run it on top of other layer 2 implementations than Ethernet.


I perhaps should have been more clear, I know this. What I meant does this protocol require ethernet? which seems unlikely.


The protocol may not but the drives themselves do so it's pretty much the same difference.


IP is not limited to Ethernet. See http://en.wikipedia.org/wiki/Osi_layer Ethernet is Layer 1, IP is Layer 3.


IEEE 802.3 exists in layer 2 also. Each frame holds source and destination MAC addresses (and a lot more besides).


"The physical interconnect to the disk drive is now Ethernet." I don't know if the new standard requires Ethernet, but I would be very surprised to see any other interconnect on these drives.


tl;dr it's not nearly as cool as it could have been. I already posted a more detailed explanation here:

http://pl.atyp.us/2013-10-comedic-open-storage.html

I tried to post a comment on the NSOP (Not So...), but first I got "HTTP internal error" and then I got "duplicate comment" but it still hasn't shown up, so I'll post it here.

"The “private” bit is important; although various techniques have been created for shared (multi-master) access to the interconnect, all were relatively expensive, and none are supported by the consumer-grade drives which are often used for scale-out storage systems."

I was working on multi-master storage systems using parallel SCSI in 1994. Nowadays you can get an FC or SAS disk array for barely more than a JBOD enclosure. Shared storage is neither new nor expensive. It's not common at the single-disk layer, but it's not clear why that should matter.

The idea of network disks with an object interface isn't all that new either. NASD (http://www.pdl.cmu.edu/PDL-FTP/NASD/Talks/Seagate-Dec-14-99....) did it back in '99, and IMO did it better (see http://pl.atyp.us/2013-10-comedic-open-storage.html for the longer explanation.

"Don’t fall into the trap of thinking that this means we’ll see thousand upon thousands of individual smart disks on the data center LANs. That’s not the goal."

...and yet that's exactly what some of the "use cases" in the Kinetics wiki show. Is it your statement that's incorrect, or the marketing materials Seagate put up in lieu of technical information?

"they don’t have to use one kind of (severely constrained) technology for one kind of traffic (disk data) and a completely different kind of technology for their internal HA traffic."

How does Kinetic do anything to help with HA? Array vendors are not particularly constrained by the interconnects they're using now. In the "big honking" market, Ethernet is markedly inferior to the interconnects they're already using internally, and doesn't touch any of the other problems that constitute their value add - efficient RAID implementations, efficient bridging between internal and external interfaces (regardless of the protocol used), tiering, fault handling, etc. If they want to support a single-vendor object API instead of several open ones that already exist, then maybe they can do that more easily or efficiently with the same API on the inside. Otherwise it's just a big "meh" to them.

At the higher level, in distributed filesystems or object stores, having an object store at the disk level isn't going to make much difference either. Because the Kinetics semantics are so weak, they'll have to do for themselves most of what they do now, and performance isn't constrained by the back-end interface even when it's file based. Sure, they can connect multiple servers to a single Kinetics disk and fail over between them, but they can do the same with a cheap dual-controller SAS enclosure today. The reason they typically don't is not because of cost but because that's not how modern systems handle HA. The battle between shared-disk and shared-nothing is over. Shared-nothing won. Even with an object interface, going back to a shared-disk architecture is a mistake few would make.


Radical simplification and IMO this is great. Remains to be seen how this will fare in comparison with RAID. I'd wager that google would be very interested, if they already not doing something like that in their data centers.

Nerdy me likes idea of POE hub and bunch of drives doing their own thing.

Also pretty good time to start writing stuff to support this into Linux kernel and developing support apps.

my 2c


I'd wager that google would be very interested, if they already not doing something like that in their data centers.

I wonder about that.

It's usually a lot cheaper to move computation to data, rather than data to computation. The model that Seagate is presenting here strikes me as wrong, because it assumes very fat pipes (or specialized topologies) for any non-trivial app. At the scale Google operates at, I just don't see this happening.

That, and I have a healthy distrust of networks. Instead of having a box with an OSS OS and dumb drives with small(er) closed firmware blobs, now you have the OS, all the network devices and their closed firmware blobs, and drives with large(r) closed firmware blobs, just to access your data. A lot more can go wrong. A lot more byzantine things can go wrong. Drives are dodgy lying sacks of fecal matter as is; this looks like it'll make things much worse.

The model Seagate presents could be useful for data that is rarely accessed, but I'm not really sold on that either.


This model is simply Seagate trying to do via IP what FibreChannel has allowed with expensive hardware most of us don't have.

Yes, it means your switch must not fail. But if you worry about your switch failing, you have that worry if it's not handling storage too, and you deal with it with redundancy. Moving the storage to hang off a switch does not change that - if you have a single switch and it fails, your servers are just as unavailable either way.

But hanging storage off your switches means it is possible to have servers take over drives of failing servers, which makes many other failure scenarios easier to handle.

In terms of pipes, yes, that is a concern for some uses. It won't be fast unless you go to 10GE, and 10GE switches are still hopelessly overpriced. But "most people" do not serve up gigabits of content, and could do just fine with slower drives hanging off cheap 1Gbps switches.

I already assume not only that my drives will fail, but that the network and servers will fail too. Which means I need to replicate data over many servers on different networks. In that case having the drives be directly addressable over TCP/IP is not an added complexity, and it opens up so many opportunities in improving flexibility of server enclosures etc.


your servers are just as unavailable either way

Right, but having local disks reduces the sources of failure, reduces contention, reduces latency, reduces the complexity of failures, and is thus much nicer to work with. Computers and networks would be easier for us to debug if they had a binary works well/doesn't work at all, but we all know they don't. Especially networks.

The simplest and sanest architecture is keeping dumb disks local to where computation is running (and yes, that may also include duplicating data across several servers). Anything else is asking for more crazy classes of failure. Been there, bled there, not going back there.


It'd be very interesting if BackBlaze open-sourced at least part of their code. It may be optimized for archival purposes but they're sticking your data on multiple 180TB pods using an open-source stack.

JFS file system, and the only access we then allow to this totally self-contained storage building block is through HTTPS running custom Backblaze application layer logic in Apache Tomcat 5.5. After taking all this into account, the formatted (useable) space is 87 percent of the raw hard drive totals. One of the most important concepts here is that to store or retrieve data with a Backblaze Storage Pod, it is always through HTTPS. There is no iSCSI, no NFS, no SQL, no Fibre Channel.


Why do they need the overhead of HTTPS for internal use like this?


I'm seriously skeptical of this protocol performance. Ethernet and TCP/IP induce a pretty heavy overhead. This overhead is totally acceptable in LAN/WAN networks, but in case of storage network you want to keep latencies as low as possible.


It's not about performance, it's about getting into the intelligence game.


OK, so head 'sploding a little... this is basically a hardware implementation of a Redis/MongoDb key/value store yes? If so, wow... yes... the world needs more of this, I think. Wonder if you could get it to conform to AWS S3 interface too?


It looks like it's literally a hardware implementation of Riak.


MongoDb is a document store, not a key/value store.


huh :) coraid (http://en.wikipedia.org/wiki/Coraid,_Inc.) has been doing ata over ethernet for quite a long time now, how is this any better ?


1) Hard drive industry was the main example cited in Innovator's Dilemma.

2) Various posts pooh-pooh'ing this development (including the current top post) here are committing the classic mistake described in that book made by incumbents which leads to disruption by new entrants to the market.

3) Seagate is doing something right. It doesn't guarantee that they'll win the next phase of the storage battle but they are doing something radically different which has a plausible marketing story appealing to a large base.


The problem is, this isn't innovation. NASD did the same thing better fourteen years ago. Disruption could still occur if somebody would evolve those ideas in a way that makes it suit users' needs better, but AFAICT Seagate is doing the exact opposite. It's difference for the sake of difference, "disrupting" only their own business.


Jeff, if you look at my linkedin profile, you might notice that I have some background with object storage systems. I'm not familiar with any incarnation of NASD or a similar system that was sold as individual disks accessible over IP.

If they do end up selling it as individual disks directly pluggable into standard racks (with some easy means of supplying power), this certainly would count as a potentially disruptive innovation. As I said before, whether market reacts positively or not is not certain, but it never is. But kudos to Seagate for trying it.


You're right that NASD never became a standalone product, but at least the design was technically complete enough to be the basis for real systems (e.g. Panasas).

A technology is only disruptive if it provides some benefit over existing alternatives. Will Kinetic be cheaper than today's drives paired with tiny ARM servers (the storage-industry disruption that is already upon us)? We won't know until Seagate actually sets prices and tries to sell some, but it doesn't seem particularly likely. Will it be faster? Again, possible but not particularly likely. Will it allow higher density and/or lower power consumption? Will it be easier to build applications on top of this new and limited API than on top of the ones we already have? Without a credible positive answer to any of these questions, how is this disruptive?

If Seagate had done this right, especially wrt security and semantics beyond get/put, this might be a better building block for a whole system. That might be disruptive. But they've made such a poor start technically that if they make a dent at all it might well be someone else who achieves the actual market breakthrough. Once Seagate has done all of the marketing, that makes it far easier for someone with a better actual implementation to succeed. Look forward to the WD/HGST Object Drive, with an API that allows you to use one securely without a server to do read-modify-write and metadata management for you. Then I'll cheer.


I don't think this can be seen as a possible replacement for PC disk because of it's high latency. Beside unless you are very rich Ethernet is only 1Gbit/s.

On the oher hand I see an opportunity as shared storage for mobile and ligthweight devices. Using a single and simple protocol, compared to NAS, could open a new technology domain and market. Of course it requires also a good integrated authentication and access control system because on Ethernet this data might be open to the world.


Gigabit ethernet used to be ungodly expensive. Once 40G or 100G ethernet starts hitting production switches, 10G will start to become economical.


I'm wondering if 10G really will become economical any time soon (for consumer products).

It's been around a long time now, much longer than 1Gb was around until it started becoming consumer products. 10Gb still uses quite a lot of power, and consumer demand is virtually absent since the 10x speed we got from 100Mb to 1Gb has been "fast enough" for home users, and will be for many years.


Well, technically you can now get into 10Gb for around $150 per port. http://www.amazon.com/Netgear-12-Port-ProSafe-Gigabit-Switch...


Which is my point, this is well over an order of magnitude more expensive than what the 1Gb consumer switches people would buy for home use.


Well, I suppose it depends on what you mean by 'affordable'. 10G is now affordable for small businesses, when up until recently it was strictly the domain of the large datacenter. I wouldn't put it in my house yet, but I see no reason to assume that 10G copper switches won't get to that price point eventually.


you must factor in the sfp+ module, around $200. So a complete port would be $350, $700 for a point to point link. Thunderbolt is cheaper


No, that switch is 10G copper, no sfp+ required. Thunderbolt and 10G ethernet aren't really competing in the same space, so I'm not sure why you brought it up.


Anything over fibre or non standard cabling is likely to remain expensive for quite some time. 2.5Gb is about the limit of cheap alas for a long time.


Fibre is not required to beat the 2.5Gb/s limit. Standard copper Cat 6a cables already support 10Gb/s for distances up to 100m.


If people have 6A and you use a large amount more power. For an application like a hard drive the power consumption of the ethernet connection would be more than the rest of the system probably.


So use twinax.


Which goes back to "non standard cabling is likely to remain expensive for quite some time".


An interesting upgrade to Google Protocol Buffers: http://kentonv.github.io/capnproto/


I'm really happy that this seems to be a Riak implementation: http://www.marketwatch.com/story/basho-and-seagate-partner-t...


Hm, no power-over-ethernet in the design? I guess that really is a good thing. Maybe.


Were going to have to have a low profile Ethernet connect then aren't we. If this takes over there's no way that plug can get put on our ever shrinking devices.


They're running Ethernet over a SAS/SATA connector.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: