Hacker News new | past | comments | ask | show | jobs | submit login
Why the Future of Data Storage Is Still Magnetic Tape (ieee.org)
142 points by sohkamyung on Aug 29, 2018 | hide | past | favorite | 155 comments



My dad's PhD thesis is stored on a magnetic tape formatted for a PDP-11. He'd like to copy it off, but it's difficult to find a computer that can read it! Back in 2006 we went from Geneva to Lausanne to visit Musée Bolo, but their PDP-11 had a problem with its Winchester drive (hard disk) and wouldn't boot. After a few hours, we gave up. Please get in touch if you have the tools - I know he'd like his digital copy back!

Even more modern formats are difficult to copy forwards. I'm currently tidying out my old computer collection, copying off data from SCSI disks, 5.25" floppies, DD floppies, HD floppies, and CDs. My USB floppy drive can't read DD floppies.

The SCSI path requires copying to a PowerBook G3, removing the internal hard drive, putting it into a USB-PATA controller, and copying it to my laptop. I'll need to do the same for all the 2GB JAZ cartridges. I don't even have a ZIP disc reader, so I gave those two discs away to another collector so he can read them for me. The 5.25" floppies are the hardest - they don't mount even on old Macs, so I'll need to figure out how to set up ADTPro to copy them over serial from an Apple IIgs.

If you're in the area and want some old hardware/software/magazines/user manuals, please let me know!

https://www.reddit.com/r/VintageApple/comments/99223h/peters...

The lesson? If you want to keep the data, also keep the computer that can read it. And make backups.


> The lesson? If you want to keep the data, also keep the computer that can read it. And make backups.

Or just copy the data over the moment you get a new computer, instead of waiting for several decades. As long as you keep moving the data from generation N to generation N+1, there's no problem.


Not quite. But possible.

I have two 1TB external hard drives. The data was added over a few years. But now to copy 1TB over USB 2.0 is beyond painful.

Fortunately I can just disassemble the drives and connect them directly. But that's if I have an old system with IDE controllers.

If I can find an IDE to USB converter I'd be back to square one (unless it is USB3). And I don't know if there is such a thing as an IDE to SATA converter.


You're looking at 9 hours or so per TB over USB 2.0 (~30-40MB/s typically). That doesn't seem beyond painful to me, just run it overnight and you're done in 2 days.


Speed is a feature. USB 2.0 storage is only ever so slightly cheaper but will cost hours of your life. I don't bother buying USB 2.0 storage anymore. I observed typical speeds of 25-35MB/sec depending on the motherboard.

Some of my older thumb drives clock in at 4MB/sec. My Internet connection both download and upload is faster than that.


That's by doing the math real life isn't so kind.


Alternatively, print the data with a laser printer on acid-free paper and encase it in epoxy resin:

http://carlos.bueno.org/2010/09/paper-internet.html


Missing one more criteria,

4. Be recoverable

How exactly do you extract the information after you drown it in resin?


If you read the article and look at the pictures, you will notice the resin does not touch the paper stack to be conserved.

So when you are ready to recover it, you can just break the resin shell to get at the paper stack inside.


> just break

Elaborate?



You read it?


The poor man's laser etched silicon?


I went to CERN in Geneva last year, I was informed they still use tape technology to store data. Its works, and their datasets are very large.

http://giving.web.cern.ch/lhc-tapes


The general advice at places like /r/datahoarder is to copy all data to a new medium every 10 years to "update" the storage interfaces and standards of the data.


I used to have the model file for the Titanic directly from one of the animators for the movie saved on a zip disk. The click of death made that file forever lost.


It would be great if magnetic tape were more accessible for consumers. A while back I was reviewing local backup and storage options, and I read up a bit on it. I think it scales better when you have massive amounts of data, but the initial price is fairly steep.

At the consumer level it makes way more sense to buy HDDs. My suggestion would be to pick up something like the EasyStore 8TB external HDD [0] (on sale right now at Best Buy for $160, will probably drop again soon). You can open it up, extract the drive, and drop it into your computer if you want internal storage.

If you want to reduce the risk of data loss you can also fill one up with an encrypted snapshot of your important things and ship it to a family member. Very important if you live somewhere that's at risk of being destroyed by natural disasters.

[0] https://www.bestbuy.com/site/wd-easystore-8tb-external-usb-3...


I’ve been happy with M-Disk DVDs for durable, disaster-proof, write-once backups. Gets a little pricey when you start talking about TB, but for family photos and important documents it more than meets my needs.


I'm not familiar with M-Disks but I swore off optical media for long term storage around 2004 when the discs in the binder I use to carry around started turning up empty. Several of them were those Kodak Gold Archival CDRs that were supposed to last a decade but were dead inside 3 years.


Thank you, had no idea. I will consider this for photos and such.


For the majority of consumers, cloud backup makes more sense than tape backup -- few people have the discipline to stick to a safe backup regimen.


I was thinking that having a separate medium for backup might make people think about it differently.


Cloud backup eliminates local disaster data loss. Which is amazing. It also introduces counter party risk, e.g. Google style service shutdown with 30-90 days to download your data or even a Megaupload like shutdown.


Isn't that only a serious risk if you are using it as cloud storage as opposed to cloud backup? If you're using it for backup, your only risk is that you won't have a backup for some period of time during/after a shut-down, until you have uploaded your local copy to a new provider.


Cloud backup will never work for 8T of data :-)


With a gigabit internet connection, it would take about a day to backup 8TB. 100mbit could do it in 222 hours. At 10mbit, about 90 days.


Last year I took a subscription to Backblaze, (incidentally) I have around 8TB of data, I have a gigabit upload (a real one), but Backblaze servers are on the other side of the planet for me. It took around 3 or 4 months to upload it all.

That's to say, there's more to it than upload speed, if Backlaze had servers in Europe I'd recommend it much more than I do.


What about another 3 or 4 months to download a restore?


The core data that I'd want to restore immediately if I lost my primary fileserver is only around 100GB or less

I have several TB's of other data (mostly photos, videos, etc) that I'm fine waiting for weeks or months for if needed, but if you're in a hurry, Backblaze will sell you a hard drive that they restore your backup to and mail it to you.


At that point, one might as well just wind it off to a hard disk in the first place, then store it offsite.


Then I'd need multiple hard drives for redundancy, and I have to bring them back and read them regularly to make sure the data is all readable. And while I have a large amount of data that I rarely touch, I have a small amount that changes frequently, so I need to include that data in my off-site backup too.

Or I could just sign up for a cloud backup service, spend a month or so uploading my initial snapshot of data, and then backups are automatic and always offsite, replicated, and scrubbed. For $10/month.


Most non tech savvy people don't even have that much data to backup: most of their heavy stuff (music, pictures, etc), are now attached to an online service.

Plus you only do it once. Differential backup would take down the size of the next backups to a few Gb, or even Mb.


It turns out that actual throughput on nominal "gigabit" consumer ISP connections ranges from 0-95% of the branding. It is only safe to consider them as "may burst as high as 980 Mb/s" and to actually test your connections to any given destination. Oversubscription and the resulting congestion are assumed into the consumer ISP model.


+ $400/yr for storage. A few hard drives, DVDs, and tapes are much cheaper and pretty easy (even if you have to pay $500 once to get an IT pro's help.)


HDD need regular replacements if you want them to work in 30 years. It's cheaper than 400$/year, but not as much as you might think especially if you want to ensure 3 copies of each piece of data.


Even assuming that a consumer "gigabit" connection is actually delivering that speed, you also have bottlenecks in the wire protocol (sftp or scp or https or whatever) and the storage service. It would take me longer than a day to copy 8T to the archive storage within my own data center using HSI (an ftp-like utility for accessing HPSS storage).


EnTouch limits me to 1tb / month on gigabit fiber


Got about 25TB with crashplan so far. It is slow to upload, but I am slowly getting the 35TB and counting uploaded.


This HP tape drive costs $2347 and can store "up to" 3T per tape.

https://www.amazon.com/HP-LTO-5-Ultrium-External-EH958B/dp/B...

The tapes are $31 and only store 1.5 T.

https://www.amazon.com/LTO5-Ultrium-1-5TB-3TB-Case/dp/B003KR...

I can buy a 1.5T hard drive for $47.

https://www.amazon.com/Generic-1-5TB-Internal-Desktop-Drive/...

I'm not seeing the cost-effectiveness of tape.


LTO5 is out of date, the prices you're seeing reflect there being a small but consistent demand for enterprise customers needing to purchase these to read LTO5,4 and 3 tapes and to replace failed out of warranty drives .

The latest version of the LTO standard is LTO8 and drives are around $2700 http://www.backupworks.com/Quantum-LTO-8-SAS-tape-drive-TC-L...

Raw capacity of LTO-8 is 12TB at $160, which is a much better value than LTO5

Factor in that these tapes are designed to be written to and then sit on a shelf for years and that such devices are more frequently purchased by enterprises expecting a long service life out of them (say 10+ years) the prices for the complete system do tend reflect this with a higher cost to entry.


> Factor in that these tapes are designed to be written to and then sit on a shelf for years

This is important. Your $47 consumer hard drive will likely not have that longevity, nor will your home computer 10 years hence likely have the correct interface and drivers to access it.


I have some older hard disk drives, and none of my current hardware will recognize the drives, even though the interface cable fits.

Ditto for my 5.25" floppy drives.

I finally threw away my old magtapes because there is no way to read them.

I threw away my zip disks for the same reason. And my MD disks. And my old tape cartridges from the 90s - no way to read them.

My old computers I retired in perfect working order will no longer boot (probably bad capacitors, who knows).

VHS and LaserDisc players are no longer made.

The only long term solution I've found is to buy new hard disks every year and copy the data forward.


An IBM 3592 JD Advanced Data Tape Cartridge costs a bit over $200 and has a raw (uncompressed) storage capacity of 15TB. An LTO-8 cartridge is 12TB and under $200. Meanwhile the cheapest 10TB HDD Newegg sells is $304.27.


10TB is not cost effective HD size 8TB is 160


8TB harddrives don't have the long term durability of a tape. In a cold and dry room you can have your tapes sit for 3 decades and they'll work.

Harddrives usually last about 10 years without power at best before you loose significant amounts of data (you loose bits beforehand, don't worry).


> In a cold and dry room you can have your tapes sit for 3 decades and they'll work.

If you can find a working drive, and drivers, for a 30 year old tape. Look at all the problems NASA has with their old tapes. The tapes actually are fine, it's just a major effort to custom build a machine to read them.


Oh don't worry, you'll get a drive and drivers, these things are build for it and as long as not all vendors for the standard are bankcrupt, you'll get something to read the drive.

You can still obtain LTO-1 compatible drives nearly 2 decades after the standard was released.


What does the tape drive cost for it?


No idea. It seems to be sold under the "if you need to ask, you can't afford it" pricing model. These things are definitely not intended for home use; I don't think any tape drives being sold today are, really.


The internet says around 1000 dollars. I guess it only makes snese if you need to store a petabyte.


Ahh, the old old old saying, "Never underestimate the bandwidth of a stationwagon full of tapes" still holds true.


Yeah, but the latency is a killer.

I haven’t thought about SneakerNet since the late 80’s floppy disk days.


It probably will hold true for a long time due to increasing media densities


The problem is you don't care about the fastest link on a path - the bottleneck link bandwidth is what matters. The stationwagon isn't the bottleneck; the tape drive is.


Streaming, tape compares favorably with HDD TFA suggested twice the speed?. Random access, of course, tape blows. :P

So really, the bottleneck is _mounting_ the tape.


This actually seems like a fun math calculation... Has anyone done this before?


You could easily fit 10PB in the back of a pickup truck (a thousand 10TB hard drives). San Francisco to NYC is a 4 day drive at 12 hours a day.

So that’s 10PB/4 days = 231 Gbps

Not bad!


I would expect ~2,000 tapes in the back of a pickup, at 6TB each. Note that:

- Tapes are easier to stack and load in boxes - Tapes are are more resistant to shock and damage from vibrations - Tapes are generally more resistant to damage from the environment - Tapes weigh less than hard drives (this is true whether measured per tape or per byte)


Don't forget to include the time it takes to write to and read from the tapes (or HDDs)


But if you transmit the same data using networks, you also need to read / write from the tape/hdd/whatever, unless you're talking about RAM-to-RAM speed.

IMHO you should just account for the time to mount/connect the HDDs to the computer.


I don't think this is true for the general case, though. At some point one has to declare that the data has "arrived" in its usable form at the destination, and RAM seems like the wrong threshold (since, for now, it's volatile).

Although one could consider HDDs to be immediately usable at the destination once they're "plugged in", there are too many exceptions (including good reasons not to re-use a device subjected to such conditions in production) just to assume it's true.

If they were SSDs, I might agree, but those are still too expensive for the use case.


If it takes 4 hours to write to the drive/tape, 4 hours to transport it, and 4 hours to read it back in, that is 12 hours. If you are sending it over a wire, say the wire speed is 6 hours, the reading and writing are happening at the same time as the transmission. So total time is still 6 hours (you don't have to read the entire drive before beginning the transmission, and you write out to the remote side as it is being received).


And you could read/write transported HDDs or tape in parallel. If you’re transmitting over the internet, it’s much more difficult to have parallel transmission paths.


HDDs, yes, but tapes, no.

That is, HDDs are generally unbounded in the parallelism because disk bays are cheap [1] and plentiful.

OTOH, tape drives are expensive, multiple thousands of dollars, which makes them scarce.

[1] $100/bay in quantity, but, in some cases, effectively $0 if counting unused bays in existing servers.


You can still have multiple drives on either side. Sure they are expensive, but if we're talking hundreds or thousands of tapes (we have the size of a station wagon to work with), there's a budget for more than one drive on either side. Even with drives, if you get too elaborate of an enclosure or JBOD, you're looking at thousands too.

Hell, you could even bring the drives with the tapes!


My point isn't that parallelism isn't possible with tapes, but, rather, that it not comparable, due to cost, at same order of magnitude or two.

> Even with drives, if you get too elaborate of an enclosure or JBOD, you're looking at thousands too.

This actually supports my point (except the "too", which is misleading). One can reasonably expect that $2k-$3k enclosure to support 16-44 disks for that price [1], compared to a tape drive's singleton.

[1] Full-featured Synology NAS with 8 drives is under ~$1k, for example, while a CSE-847E1C-R1K28JBOD holds 44 drives for $2800.


If you’re saving on the order of a hundred dollars per tape (12TB), you’re going to have enough left over for a few readers.

But - this is a silly argument. I think we actually agree on all of this and we’re talking about an extreme hypothetical that while I’d love to test out, isn’t something I’m going to do in the near future!

Although, a few years ago, I did have a similar issue. I needed to love about 500TB of data from one data enter in Menlo Park to a new one in SF. We tried everything we could to make sufficient backups, but it was hard to backup 500TB of data over even a 10Gb/s link. We ended up just moving the physical JBODs in the back of a rental car. Most stressful drive I’ve ever made.


> If you’re saving on the order of a hundred dollars per tape (12TB), you’re going to have enough left over for a few readers.

That's a good point, and it certainly improves the attractiveness of tape overall. Still, a $200 saving in medium doesn't make up for a $2600 difference in drive cost. How well does it improve the appeal of tape for parallelism?

The media+drive cost of LTO-8 for 12TB would be around $2850, while for disk is only $550. Considering LTO-8 has transfer speeds of 1.6x or so that of disks, that means only 5 tape units are needed to match 8 disks. That's still $14.2k compared to $4.4k, over 3x. (With disks, I'd want to do RAID6 or RAID-Z3 at the very least [1], which further reduces that ratio, but not below 2x).

That's certainly much closer to parity than an off-the-cuff estimate, but it's still not close enough to be practical. Anyone needing to transfer data in bulk, fast, would still do well to choose disks, not tape.

> this is a silly argument. I think we actually agree on all of this and we’re talking about an extreme hypothetical

I disagree as to silliness, and as to argument, since I also think we generally agree. This is a discussion in the spirit of HN, satisfying intellectual curiosity.

I also disagree that it represents an extreme hypothetical. I believe your own anecdote is nowhere near as rare as you imply. Even at lesser scale, it's an issue: the existence of the AWS Snowball is a testament to that.

[1] Ideally true sector-level ECC, though AFAIK, ZFS can provide a close enough approximation


If I remember right, it was a problem in Tanenbaum’s book.


Correct.


XKCD What If #31 is a good one.

https://what-if.xkcd.com/31/


Since we have 512GB µSDs the milk bottle would go from 1.6PB to 12.8PB, all other results need to be multiplied by the 8 times increase in storage density...


Could easily stack dozens of milk bottles in most SUVs/Trucks. I have a new business venture guys!


Not really a new business venture, sneakernets have existed for a long time. On Cube having someone check by your house with a few terabytes of internet to copy to your disk is a thing.

And companies regularly have interns carry harddrives between servers or even have someone drive it between cities because it's simply cheaper to pay for gas and driver than to pay the transfer over wire. And usually faster too.


Pendantically it's a trick question; the bandwidth of a tape is 0 MHz.


Yeah, Amazon. It's their Snowball product


The problem is it's near impossible to get a tape backup system at home.

It's all fun to say "oh a single tape can do 15TB" but it must cost in the thousands and thousands of dollars.


I think that's because the market has shrunk so much.

I remember when I was younger, I got a single SCSI external tape drive, and some tapes, from the Micro Center in Cincinnati Ohio. I then used that to back up my computer while I was in college. I've forgotten so many of the details, like the model of drive and tape, and the backup software (it was Mac OS). I do know it wasn't LTO, though, because it had to be rewound.

These days, were I to do something like that, I'd do BD-R. Yeah, you're only talking 50 GB or so (uncompressed) per disc, but that's not bad!


As a kid, we had one of those home/small business targeted drives that attached to the PC via the floppy drive controller. QIC, or whatever it was called. Native capacity was ~120 MB, and the drive had some built-in compression so it was marketed as 250 MB. It made a horrible screeching noise when running.

Linux even had a driver for it at some point (ftape), but IIRC it has since been removed due to lack of use. Never used it though, by the time I got more into Linux that drive was already obsolete.

But yeah, today the startup cost of a tape drive is so high that it doesn't make sense for home usage. At home I use borg backup (https://www.borgbackup.org/ ) nowadays, backing up to external USB hard drives.


I had a HP colorado backup (400/800), I think it's QIC like.

Tried to use it on an old p3 box with win95, and was utterly surprised that win95 backup application had support for generic backup tape drives that allowed me to restore the tape content.

I also love the super smooth mechanical sounds of tape drives, even at the cost of slow seek.


Probably Retrospect for the backup software, it was the standard on Mac OS. I feel like 8mm was probably the most common consumer (i.e. cheap) tape format, but also could’ve been 4mm.


Yup, it was Retrospect!


I knew a guy in the 1990s who hacked a VCR to be a data backup device. I can't recall if he used the video track or the HD Audio track but he said the tapes actually had quite a large capacity and of course were not very expensive.


There were commercial products to use your VCR for data backups

https://en.wikipedia.org/wiki/ArVid

Danmere Backer https://www.youtube.com/watch?v=TUS0Zv2APjU (LGR)


LTO has to rewind in certain circumstances. The file listing can be browsed without having to seek, but the accessing the content, writing or erasing cannot. Still somewhat unlikely it was LTO, but probably not due to the rewinding.


What is LTO, and why doesn't it need rewinding?


Would be great if this was the sort of thing local libraries could help out with; they provide the read/write hardware and you bring the tape and data.


I was thinking —like other top-level comments here— that it's a great shame that consumer level drives have died out. They're thousands of dollars and all wired for enterprise interfaces.

But I'm really surprised that nobody[1] is offering "tape backup as a service". Pay $30 per 3TB cassette (slightly over market rate), a $10 loading fee and then $1/h for 100MB/s write access. When you're done, pay another $10 and it's shipped to you. Or $5/month for fireproof storage.

This might have to be a pre-booked service, or they could buffer upload onto spinning disks and write to tape after the fact (much faster), but for storing more than 1TB, this method could be vastly cheaper.

[1]: That one quick Google could find.


How would this, in effect, be different from any serious backup service in existence today? I'm sure they do their backups of your backups on tape, and at least rsync.net is willing to ship physical media with your data to you.


Price!

In the simplest version of my post, you're paying for media, the time for somebody to pop it in the machine, the hire of that machine while you copy data to it over the network, and the postage of a tiny little LTO back to you for archival.

On higher density tape, you could be looking at a one-off $100 total for writing 10TB of data to tape and mailing it back. There's a lot of room between that price and the nearest incumbent to make a strong profit there.

rsync.net seems very bespoke, so they may be able to give you a tape, but you're looking at $250 per "incident" plus hardware. They waive that fee if you store more than 100TB ($2k/month).

Yeah, they're much more available but most of us aren't after cloud storage, we want a lasting backup in case the NAS dies, or the house burns down. Restore speed doesn't matter.


Ah, true. I didn't consider the "one time dump of a boatload of data" as a use case. My backups tend to run relatively often, and at that point, $10 for each tape load would easily make the tape service more expensive!


AWS Glacier provides a pretty cheap[0] backend for such a service. You could store 1TB for 1000GB * $0.004/(GB * mo) = $4.00/mo, which is dirt cheap, all things considered. Retrieval is about 10x that, per GB, but generally if you are storing backups, you aren't concerned about retrieving it too often, so that cost is acceptable. Compare this to the S3 bucket costs for the same storage size($10/mo for infrequent access, and $24/mo for standard), or to google drive 1TB tier ($10/mo, but no API as far as I know).

[0] https://aws.amazon.com/glacier/pricing/


That's per month... In just 9 months Glacier becomes more expensive than a physical SATA drive(s) of equal capacity. The longer you keep the data, the more it skews against "the cloud".

I'm definitely not claiming to have all the answers here but there are points where tiering your storage strategy to get it onto something stable and without running costs makes sense.


Indeed, at 1TB scale, cloud storage can sound "dirt cheap", and, of course, it is, since the fixed cost of hardware to store that reliably is the same as for 100TB.

10x that, or a petabyte, would cost $144k-$180k after 3 years, depending on region. Hardware [1] might be, generously, $60k. The hardware would also have operating costs of another $26k [2], but Glacier would have retrieval and data transfer costs if one ever wanted the data back.

Of course, tape, the subject of TFA, is even better. $17k [3] for the hardware and minimal operating cost in the labor required to change the tapes.

VC-funded startups have been tending to ignore this artithmetic (which applies to all cloud infrastructure, at only slightly larger scales), lately, so cost isn't always an issue any more.

[1] Let's say 100 12TB drives for 84 data, 12 (triple) parity, 4 warm spares. 3 enclosures, 6 SAS HBAs, plenty of CPU and RAM for ZFS.

[2] Generously, the above is 2.7kW. Adding 50% for cooling, at 12c/kWh. A commercial datacenter could double that (or triple or more, but, for backups, presumably one would find a geographically cheaper one).

[3] LTO-8 drives are $2700 and 12TB cartridge are $170.


> Of course, tape, the subject of TFA, is even better. $17k

> LTO-8 drives are $2700 and 12TB cartridge are $170.

What I initially ignored was the cost of a "library" (robot), which is, arguably, necessary if the minimum time to write a tape is over 9 hours (longer than a standard workday, meaning 1PB could take 17 workweeks to write with a single drive with no robot).

Including a library to hold all 84 cartridges might be tricky, but it appears there are 80-slot ones in the $12k range, which brings the total tape price up to $29k and reduces the write time to under 5 weeks.

Alternatively, one could just buy multiple tape drives and no robot. $25k gets faster write time than above.


It's not cost effective with the rise of the cloud and broadband.


What isn't? Home tape drive ownership, sure. But what I'm proposing (essentially hiring a remote tape drive, getting the result back in the post) is vastly cheaper than cloud storage.


While there's an explosion of data that we keep hearing about, the aspect I'm curious about is how much of that data will at any one point be "hot" or even "warm," in that it needs to be processed and analyzed in short order (talking microseconds or nanoseconds, not in tens of seconds like tape (or more as the data then needs to be sent over a network to a server somewhere else)). Cold storage can be archived on tape or HDDs, sure, but as compute continues to grow, surely more and more data will need to be hot or warm at any given moment. Just whether that hot/warm data will grow its overall share of data in existence or simply grow at the same rate is what I wonder about.


When I started working on mainframes at a big financial company there were jokes about the 'tape monkeys' which loaded tapes when we requested old datasets. We laughed and dismissed that, surely everything was mechanised.

It was only after a year or so that I was cleared to visit the data centre and discovered that the mechanical tape silos had failed years ago. Every tape had to be located and loaded by hand. I felt guilty about all those datasets I had ordered on whim... and was surprised that the DC guys didn't have a 'Hit List' of progammers.


I'd be very paranoid if I were you :)


So, first I would quibble a bit with the time scales you're talking about, as tape access could easily be more than tens of seconds, depending on status: Even a loaded tape might take a minute to reposition, if the data you want is at the opposite end.

In the world I support, everything has to be funded by something. We have our "cheap and deep" storage, for example (http://oak-storage.stanford.edu), but labs who use it have to pay for it.

Unchecked growth is highly unlikely, as grants run out eventually, and then the data will have to go somewhere else. Often these days, that ends up being Google Drive, but there is demand for tape archival storage.

Even the Stanford Digital Repository (https://library.stanford.edu/research/stanford-digital-repos...), who are extremely serious when they talk about "long-term", say they're developing pricing models.


Well, I was keying off the article itself, which said the following: The long length of the tape held in a cartridge—normally hundreds of meters—results in average data-access times of 50 to 60 seconds

It's tens of seconds, up to a minute, but that's just average, so I think we're on the same page here!

Even when not in academia, everything has to be funded by someone at some point. Sounds like you have a pretty awesome job though, and I imagine Stanford is one of the least likely to get cut off from precious NSF grants!


It is suggesting there are research that shows it is possible to have 20x times the density of current types, which tops at 15TB. But these research are like battery breakthrough, good in theory but not thought through in practice.

While we have a real roadmap, and a working solution of MAMR, much more so than the proposed HAMR from Seagate, that will scale us to 40TB per HDD by 2025, and may be more so given we don't really know the limits of MAMR yet, and we manage to put more platter inside a helium drives. Which I don't think is possible with HAMR, laser heating plate with helium filled drive has got to fun inside labs.

NAND Flash is also dropping in price, the current NAND spot price are already backed to 2016 price or may be even slightly lower. We have Fabs from Toshiba, Samsung and SK Hynix all coming online in 2019, all while 3D NAND are achieving better yields. The "Ruler" that Intel proposed has now standardises and becomes EDSFF, with up to 1PB storage in 1U. All current roadmap suggest we should hit 8PB or 16PB in 1U by 2025. In terms of 48 Drive in 4U server, HDD offers only 480TB per 1U by 2025. That is 16 to 32 times the density difference comparing to NAND.

In a system where information is stored in different drives like black blaze[1] , your reliability reaches a point ( I think, forgive me if I am wrong ) no different to those offered by Magnetic Type. What are the advantage left for using magnetic type? When cost, performance, space, reliability does no longer flavours them?

[1] https://www.backblaze.com/blog/cloud-storage-durability/


Tapes can last for 50 years. They are the MOST stable and reliable form of storage we have.

We have also not even reached any kind of limit on density. Sony has tapes that hold hundreds of terabytes.

So, they last forever, are extremely reliable, and extremely data dense.

HDDs exploit the same concept (magnetic storage) but trade speed for reliability. NAND is more reliable but still has a shelf life, and it also relies on support chips to constitute the overall media. It's also expensive as hell.

Tape is tape. You could smash a cassette and respool the tape and read it. Its also cheap as hell per gb.

Tape will be around for decades if not centuries to come. NAND is not a challenger whatsoever.

A key piece of the puzzle here is that not only are tapes used for backup, but to free up space from data center HDDs in order to increase their shelf life.

Stop thinking of tape as archaic. Magnetic storage is the foundation of modern technology and computing, and in past decades we barely scratched the surface of its potential.

We aren't talking about audio cassettes with analog audio recorded to them. We are talking about a storage medium that is more data dense than HDD or NAND could ever be and we have yet to reach any sort of limit.

With tape you can achieve more and more space by manipulating the width and length of the tape, the tape heads, the angle at which tape is read, and the way that the data is read and stored. It's also just a plastic cassette with polyester tape coated with rust particles. It's inherently cheaper, requires no circuitry, power, or motors within the media itself, and will last 50 years.

Tape is the purest implementation of magnetic storage concepts. As such it is going to improve faster than HDDs because there are no other design factors like platters, motors, and heads. If an HDD dies, you might be able to recover the data through expensive and time consuming procedures. If your tape drive stops working, you buy another drive and still have an intact tape with all of your data.

Additionally, while SSDs are wonderful, they are a pretty backwards design when it comes to efficiency. You are essentially trying to model magnetic storage by using integrated circuits. You are storing values with digital logic in a physical state on a chip.

Its fast and more reliable than a spinning metal disk, but it's expensive and not data dense at all.

The only threat to tape is public perception of it being "old" when it's really the most robust and best engineered data format we have.


> Tape is the purest implementation of magnetic storage concepts. As such it is going to improve faster than HDDs

I agree with the former, but I think your conclusion misses a critical piece of technology development: volume (which drives funding).

Of course, HDD volume is going down, while SSD volume is going up. That may well give tape enough of a comparitive advantage, if it hasn't already.

> The only threat to tape is public perception of it being "old"

For a definition of "public" narrowed a bit to include only/mostly computer professionals, I'd add "slow" as a mis-perception.

Other perceptions I, personally, have is that it's expensive and inflexible, mostly in the context of startup companies. I don't believe that's any kind of threat to tape, however, since it's likely these same companies will gladly over-pay for a cloud version of tape.


GDPR - please delete me from all your systems.

JFC. Harry get the key to the basement, need to dig out some tapes. San, can you pop down to the store and get some matches and lighter fluid...


I don't know how GDPR handles backups and what time tables are allowed for reasonable data retrieval and expunging.

Encryption decreases the volume of data that needs to be most carefully protected from the entire data set down to the key (and backups of said key). I imagine that data on mediums that make data retrieval and writing expensive can be encrypted with compartmentalized keys. So when Jane Doe wants her data deleted, delete the keys.


> I don't know how GDPR handles backups and what time tables are allowed for reasonable data retrieval and expunging.

Keep a log of user ids who have requested deletion and purge their data whenever doing a restore. And don't keep backups for longer than required, which I think is generally something like five to seven years barring specific legal requirements.


It's another reason why I think the cost of managing keys appropriately is worth it for crypto-shredding.


> Mark Lantz is manager of the Advanced Tape Technologies at IBM Research Zurich

Seems a bit odd to see such a blatant P.R. piece in IEEE Spectrum.

Wasn't optical going to eat rust-based cold data storage technologies? What happened?


Optical discs are now up to 300 GB and they have caddies the size of an LTO tape that hold 12 discs. Seek time is obviously better than tape but I don't know about the price. The only remaining vendors are/were Sony and Panasonic.

https://panasonic.net/cns/archiver/

https://perspectives.mvdirona.com/2016/03/everspan-optical-c... (Sony Everspan dropped off the Internet; did they go out of business?)


From a cheaper end-user/consumer point of view, the only real option for optical disc storage is BD-XL, which is a 3 or 4 layer Writable BluRay Disc supporting 100GB/128GB per disc.

In comparison to tape, it's not that much storage, but fairly cheap to implement at home. £60 for the drives, and about £15 per 100GB disc.


Archival Disc doesn’t appear to be available to buy yet.


Software got better. Compression, deduplication, pure incremental backups (single level zero) , reverse diffs, etc all make random access read/write backups cheaper and more attractive.

I've heard it speculated that glacier's slow restore times is consistent with a large number of optical disks and robots for loading. Never heard it substantiated though.


I can't seem to find this information with my google-fu, but does anyone know if magnetic tapes such as in vhs and cassette tapes have perpendicular magnetic anisotropy?


I’m not entirely sure what it is that you’re asking. But if you’re asking if the heads for an audio cassette are angled, the answer is no. They’re perfectly perpendicular.

For VHS, yes, they are angled. 30 degrees, IIRC.

If that’s, indeed, what you’re asking.


I was referring to whether the magnetization lies along the normal of the film/tape or if it's in plane. HDDs were able to increase density by moving to materials that had a magnetization with perpendicular anisotropy but I can't seem to find confirmation if cassette tapes and vhs had in plane anisotropy back then. More so I can't seem to find if today's magnetic tape storage has moved to perpendicular magnetic anisotropy.

It seems


Doesn't seem like it would be possible given the substrate material.


While, Intel Introduces "Ruler" Server SSD Form-Factor.

"Intel on Tuesday introduced its new form-factor for server-class SSDs. The new "ruler" design is based on the in-development Enterprise & Datacenter Storage Form Factor (EDSFF), and is intended to enable server makers to install up to 1 PB of storage into 1U machines while supporting all enterprise-grade features."

ref: https://www.anandtech.com/show/11702/intel-introduces-new-ru...


Seems kinda odd to mention uber expensive flash when talking about, cheaper than spinning disk, tape.


For the price of a 1TB SSD I can get you a 12TB/30TB Tape, easily between 10 or 30 times as much storage.

And as a bonus, while a SSD is only guaranteed to retain data without power for a year, tapes are designed for decades of cold storage.

Additionally, you could put up to 2x4x9 (HxWxD) tapes in there, at max capacity that is about 864TB/2.16PB

The only downside is you need another 1U to read the tapes, though the rest is automatic if you have a robotic tape library.


Well my point was not to show flash as superior but to point out the development in that area... every use-case has reason be it flash or Magnetic... author choose to paint Magnetic as better alternative to Flash (in capacity dimension), which is obviously can not be true for all the use-cases... (same for flash) my comment it to high-light that fact...


> author choose to paint Magnetic as better alternative to Flash (in capacity dimension)

I'm having trouble finding where in the article that occurred.

To me, it seemed to ignore flash/SSD, as the principle premise was that tape survives due to its low cost (at scale), so only the highest-density HDDs were mentioned.


But for how many years is unpowered data integrity guaranteed for modern enterprise flash storage? Magnetic media excels at archival. In fact, plain old paper does, too, provided it's stored properly. I bring this up only because storage doesn't necessarily require on-demand retrieval, and so flash storage may not be much use in those cases.


Dunno, had a room with 120 tapes and 100 spinning disks or so. The room hit 110F or so during an AC failure. 120 tapes died, zero disks. Actually I tried only 10 or so, before giving up. Tapes do seem much more sensitive, the storage temperature (non-operating) was the same as the operating temp of the disks.

Don't tapes have issues with print through and adhesion if not regularly used? I seem to recall best practices including something like annual seek to the end and seek to the beginning. Seems pretty similar to spinning disks.

I'm less sure of flash drives on a shelf.


This seems to indicate that these tapes are safe up to 120c for short periods of time. But that they should be archived at no more than 77.

https://www.ibm.com/support/knowledgecenter/en/STCMML8/com.i...


Flash drives guarantee data integrity for up to a year (JEDEC), so you'd have to spin them up and read data, though I'm unsure if this is sufficient on flash.


> And tape is very secure, with built-in, on-the-fly encryption

What exactly is this referring to?


It probably means the drive does the encryption, but so do hard drives and SSDs.


yeah, that. KMS involving the library and maybe an external key server.


TLDR;

Today, a modern tape cartridge can hold 15 terabytes. And a single robotic tape library can contain up to 278 petabytes of data.


Anyone know what exact tape they are talking about? When I google 15TB tape I only find a LTO-7 which is 6TB native, and 15TB compressed. Assuming a compression a compression ratio of 2.5 to 1 seems pretty crazy to me.


They're probably referring to the TS1155 variant of https://en.wikipedia.org/wiki/IBM_3592 .

To be honest I never really understood the main reason for adding compression, especially since you can compress individual files far better using existing algorithms and then write those to tape. It must be a marketing thing.


Ah, thanks, they do list a 15TB tape. I looked around and could find the 10TB variety for $212 or so, but couldn't track down an actual 15TB tape for sale.

The 10TB tape I found was "IBM 3592 JD Advanced Data Tape Cartridge 10TB/30TB (2727263)"


The cartridge is the same. It can be reformatted from 10 to 15TB using the newer tape drive.

https://spectralogic.com/features/ts1155-technology-tape-dri...


Seems pretty terrible considering that a tape has crazy more surface area (100 to 1000s of feet x 1" or so) and a 14TB disk is just a few square inches of usable area (2 inch^2 or so per platter).


Not if you consider that the tape can be spooled, elimating any concerns about physical volume, and data spread over a larger area means potential corruption is more distributed. You also don't need to fill the tape cases with helium.


LTO-8 is 12TB native, 960 meters x 12mm is 11.52 meters square! That's 17,856 square inches. Somewhere around 6000 mbit per square inch.

Current 14TB disks are around 1000 gbit per square inch or 166 times denser.

Sure you need a motor, seek head, etc. All about the size of a tape. But you don't need a robot or tape vault either.

Seems unclear where the cross over for tape vs disk is. Especially when you use things like a backblaze storage pod and spin down disks when not needed to avoid the heat, power, and wear and tear.

Does make one wonder why not disk robots? The disks are the same size, support random access, and can be hot swapped.

I dug around and couldn't find a price on a 15TB tape, or even what it was called.


> Does make one wonder why not disk robots? The disks are the same size, support random access, and can be hot swapped.

For long term storage, it's unlikely that disks will last as long as tapes.

All the electronic and mechanical components of a disk drive must still work in order for you to be able to retrieve data, and it's somewhat likely that if you just stick a drive on a shelf for ten years some percent won't be working afterwards.

For tapes, the separation of electronics from the actual storage medium itself means better maintainability; you can rpelace the tape drive without touching the cartridges.


> Does make one wonder why not disk robots? The disks are the same size, support random access, and can be hot swapped.

Because a tape is relatively cheap, the drive is not. With a hard drive robot you lose the (small) cost of a SATA/SAS controller per drive, but you add the cost of the robot. If you're going to get disks you might as well just hook them up.


IBM bubble memory


What about scratches


So... who actually uses them in the age of AWS / S3. Is this becoming a much much much smaller group?


It might be hard to imagine reading HN, that outside of SV there are more businesses who use tapes than businesses who use AWS.


https://aws.amazon.com/glacier/

Looks like tape as a service.


Glacier’s actual storage backend technology has never been disclosed, although this is plausible. Other proposed technologies are racks of older (less efficient) disks that are not always spun up, huge slow flash memory banks, et c.


It's expensive enough to be anything but ssds really. Glacier is more expensive that b2 from backblaze and they use hot storage regular HDDs

And that's not taking into account the extra money Amazon makes from the requests and data transfer that could offset storage costs.

I wouldn't be surprised if the glacier data was mixed right in with s3 storage. Just deprioritized when io is high.


Some past hacker news thread made a plausible case for 12" optical disks, much like bluray.


Last I heard the best thinking on glacier was that it was the dead space on the normal s3 drives which is inaccessible at normal SLA levels because IO is saturated.


I would say AWS / S3 almost certainly uses them for backups. So, depending on your outlook, everyone who relies on S3 does by extension.


S3 does not use tapes to backup. It’s reliant on large scale and erasure coding to achieve its slo around storage. They aren’t looking for point in time recovery, they only guarantee availability.

I’d say this the same for any systems optimizing for availability over RPO.


Interesting! Thank you.


Don't forget about glacier.


I thought the consensus on glacier was that it was optical backed storage? Has that changed?


I would say there has never been a consensus. The Glacier API is defined such that it could be backed by any technology. It's possible that early "MVP" versions of Glacier were just S3 with lower pricing and artificial delays. Once demand was established, they could have swapped out the backend with MAID, tape, optical, or some combination. There are a lot of comments out there saying "everyone knows Glacier is X" but nobody says "I know Glacier is X".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: