Of course, Im gonna need three, because backing one of these up to the cloud from my home adsl would take (tap tap tap, calculate) 3.5 _years!_ (sorry cperciva, not gonna tarsnap this even at picodollars per gigabyte!)
(fortunately, it'd only take 3-4 months to bittorrent down all those, ummm, linux ISOs...)
> sorry cperciva, not gonna tarsnap this even at picodollars per gigabyte!
Oh, I would, but I guess you meant per byte.
At current rates it's $3K monthly to keep a backup of this HDD on tarsnap and that's not including bandwidth. That's six 10TB drives.
I don't mean to discourage from tarsnap though. Cloud storage is unfortunately still pretty expensive, and most people are probably only truly paranoid about much less than 1GB of their data.
Because I had no idea about this Amazon unlimited Cloud Storage, thanks! Need to read the fine print though, home DB snapshots forever sounds really great. Plus there's not much to save space on their side if I encrypt them first. But even without it, I trust Amazon to keep my photos secure more than I trust CrashPlan which already have lost my data once (yes I think it's worth repeating, and no it was not PEBKAC, they admitted it and all they could do about it was "sorry").
edit: I wonder if Amazon goal with this was to get their hands on as many photos with their full metadata as possible (for training).
I'm not aware of it. But you know, in some other company (don't even know which) I remember a story where after the data loss the CEO contacted the user and tried to do whatever he can including sending hard drives to speed up uploading new copies.
In my case it was 90GB, a whole single machine. The fact that they were pretty causal about it, is worrying. But I still use them for some use cases because they have unlimited storage with software for Linux, although I don't depend on them.
Quote from the support ticket:
"I have looked again for your data, but we are unable to locate the archive anywhere in our system.
I am sincerely sorry that we have let you down as there is not a reason that I can find that this data should not be here.
If there is anything that I can to for you, please do not hesitate to ask and I will do so."
I tried an online backup service, Amazon Cloud storage, and saw upload speeds of about 0.7 Mbps (speedtests say my connection has 5-20 depending on the day). That would take over four months of continuous uploading to back up a single terabyte.
This is definitely true. I'm always shocked by how much relatives over there pay. However, outside the Asian markets, Australia is a bit off the major trade routes to the rest of the world, so general product prices never surprise me much these days. Bandwidth costs too, although I think having a much lower population density also helps drive up those prices (sadly).
In your case, even with the exchange rate as it is, price parity would be ~$80. If the taxes are ~$10 as you suggested, that means you're paying another what? $10 for essentially the same service? In a way, I'm surprised it's "only" that much more, but it's still terrible for Australian consumers.
Arq is great, but:
- no support for Backblaze B2 (cheapest cloud storage)
- Arq + Dropbox on mac = hell - dropbox triggers Arq's full rescan every few minutes, even though I'm not backing up Dropbox folder
No, the 3.4 terabytes were composed of hourly incremental backups of the home folder on my Macbook Air and my Hackintosh + some external drives with media content.
I downloaded a full restore of the state of my Macbook Air as of roughly one month ago for testing. I don't have the storage space for a total restore (which would be tens of TBs I guess, since it's an incremental backup solution [unless you could restore all the files instead of all the backups]), but I've tested everything I care about.
I actually just deleted most of the backups from Amazon (just to be nice), since I don't really need to have backups of everything that has ever touched my Downloads folders or web browser cache for example.
It is very unlikely that you care about all 8GB of that email equally. What you're paying for when you back it all up is the ability to not have to pick and choose, and especially to avoid choosing incorrectly and permanently whacking something you want to keep. That's not a criticism; there's nothing wrong with that.
On the other hand, 1GB is pretty tight. I suspect there's a lot of people who could come up with 1GB of photos they don't care to part with, and coming up with 1GB of valuable home videos is child's play, even assuming efficient encodings of the video rather than the rather flabby stuff that tends to come straight out of consumer cameras or phones.
But I bet if most people had to fit the Family Jewels into 10GB, they could, just one more order of magnitude away. Of course HN will have a way-above-average number of people who have way-above-average video recordings and such. Or, perhaps more realistically, call it a single 32GB SD card. At the moment my "family jewels" clocks in at about 90GB, however, that is with no judgment whatsoever about what photos we keep and with a lot of the aforementioned "flabby" videos, not to mention, no judgment whatsoever about what videos we're keeping either.
Depends who you ask and whether its business email or personal. Lots of businesses still use outlook or lotus notes and the email is stored on each client. And then some people like to use what they use at work.
When I worked for Nationwide insurance (50 to 60 thousand people) we could keep as much email as we wanted locally, but no more than 100 mb on the server. This is a stupid policy, but it is a common policy.
Even if he were paying me to use tarsnap I wouldn't use it for backing up 12TB.
I'd consider a backup solution that takes 3+years to make a full backup (or 4 months for a restore) to be of negative value to me...
(I wonder how many months I'd get away with uploading 300GB/month before my "unlimited" ADSL plan hit me with it's "acceptable use" disclaimers? Given that I'd be using 100% of my upload bandwidth to take this 3 year backup, I suspect the cost of 2 identical drives to make local backups would probably be cheaper than the network costs of sending it to tarsnap!)
Wait a minute, is Amazon truly UNLIMITED? Even for encrypted files from ARQ? Mine currently is going to OneDrive, but that's limited to 1TB for $7 a month which includes office 365. There has not be some catch with Amazon here.
Yeah amazon cloud drive is pretty popular on /r/DataHoarder/, quite a few people have 10TB+ on it, some even a lot more.
Personally I have close to 7TB of backups and personal video footage stored, everything encrypted, never had any issues.
However there's always the danger of them pulling a onedrive at some point in the future and reverting from unlimited back to something like 1TB if people actually use it.
There are people on that subreddit who have a lot more than 10TB and even someone recently who uploaded more than 1PB. Even considering Amazon's above average capability to store large amounts of data, I'm surprised that the individual hasn't had their account suspended.
Have 2tb of mostly photos and home videos on Amazon cloud drive. Unencrypted. No issues, I heard of people with 3-4 times as much and not having issues. I assume these services have users with much less than 1tb with a fee outliers who use a lot of space making the unlimited service feasible as a marketing feature.
It's unlimited yes, but ACD will cut you off if you start downloading a lot, I recall someone on /r/DateHoarder complaining that they wanted to restore their NAS and ACD Support was blocking any requests towards lifting the block.
Symmetric internet access would be nice. Cloud backup quickly falls down to dreamland for every casual user who does something simple as tacking photos as hobby.
Nope, they can demo it but nobody is sticking 25,000$ worth of SSD off of the tiny bandwidth you get from a single drive. So, they are not selling it night now.
Just put more drives into the box. You don't want bigger disks because of how fast they spin.
The only 'obvious' improvement you can make is having more arms or figuring out some way of micro-aligning multiple heads at the same time on a single arm. There's not much benefit from larger single units.
You can already fit way too much SSD into a 3.5 inch drive, and nobody will ever buy it because they want more ports and performance per dollar. There's no benefit for either technology to go up to 5.25 inches.
I'm guessing there are practical issues for not building 5.25" drives anymore. Vibrations become a bigger issue, power requirements are higher, and seek times would suffer as well.
This dock [1] plus 8 of these 4TB SSDs[2] is 32TB in a 5.25" drive.
If that's to pricy, you can still fit 16TB in a 5.25" drive bay with the same dock and 8 2TB rotating drives[3] and 20TB with this dock[4] and the 5TB version of the same drive[5]. Note that all 4 and 5TB rotating drives I can find are 15mm thick, so the 8x drive-bay is a no-go.
Same here. I suspect we won't see spinning Rust in that form factor because tooling costs would just eat it, but flash can scale with volume almost perfectly since it's just chips on a PCB. It's need some kind of hi bandwidth interface though, not sure what speed SAS is at, or if it's competitive to pcie nvme stuff. I haven't had to look at that stuff in a while
Random reads don't care how long it takes to read an entire disk.
Very few things do, really. It's pretty much just rebuilding a RAID or reorganizing your data storage hardware that care about full-disk transfer speed, and in those cases two days isn't a big deal.
There are great ways to mitigate the problem, but disks still get fragmented. So, at best we can say it's probably not an issue the majority of the time.
> I always wished a 5.25 mass storage HDD would come back on the market. How much would those hold relative to this?
Google presented a paper at FAST16 about the possibility of fundamentally redesigning hard drives to specifically target hard drives that are exclusively operated as part of a very large collection of disks (where individual errors are not such a big deal as in other applications) – in order to even further reduce $/GB and increase IOPS: https://static.googleusercontent.com/media/research.google.c... .
Possible changes mentioned in the paper do actually include new (non-backwards-compatible) physical form factor[s] in order to freely change the dimensions of the heads and platters. The only market for spinning rust in a decade or so will be data centres (or anywhere else that needs to store a shitton of data) -- everything else will be flash.
Other changes mentioned in the paper include:
* adding another actuator arm / voice coil with its own set of heads
* accepting higher error rates and “flexible” (this is a euphemism for “degrades over time”) capacity in exchange for higher areal density, lower cost, and better latencies
* exposing more lower-level details of the spinning rust to the host, such as host-managed retries and exposing APIs that let the host control when the drive schedules its internal management tasks
* better profiling data (time spent seeking, time spent waiting for the disk to spin, time spent reading and processing data) for reads/writes
* Caching improvements, such as ability to mark data as not to be cached (for streaming reads) or using PCIe to use the host’s memory for more cache
* Read-ahead or read-behind once the head is settled costs nothing (there’s no seek involved!). If the host could annotate its read commands with its optional desires for nearby blocks, the hard drive could do some free read-ahead (if it was possible without delaying other queued commands).
* better management of queuing – there’s a lot more detail on page 15 of that PDF about queuing/prioritisation/reordering, including the need for the drive’s command scheduler to be hard real-time and be aware of the current positioning of the heads and of the media. Fun stuff! I sorta wish I could be involved in making this sort of thing happen.
tl;dr there is a lot of room for improvement if you’re willing to throw tradition to the wind and focus on the single application (very large scale bulk storage) where spinning rust won’t get killed off by flash in a decade.
Helium does go through metal, although even in this case the amount is probably insignificant.
Helium also goes through whatever the sealing gasket is made from, at a much higher rate. http://lpc1.clpccd.cc.ca.us/lpc/tswain/permeation.pdf is a neat chart, probably the gasket is buna-n because it's cheap. I think that chart and https://en.wikipedia.org/wiki/Permeation are enough to get an order of magnitude answer to OP's question, but I'm not quite up to doing the math.
Serious vacuum systems have to worry about this sort of thing more than one might expect. I once had a project where two test engineers and myself spent around a week hunting for leaks in a vacuum system that turned out to be caused by my specifying Silicone o-rings instead of Viton.
No seal is ever perfect, that's the concern. Plus helium, being monatomic, is able to jam into tinier holes than things like oxygen, nitrogen, or water vapour.
I've asked people who used the first gen He drives because it's an obvious worry, and the consensus seems to be that they're at least as reliable as conventional drives. MTBF doesn't seem unusually bad, and can even be better than some trad designs.
This is different than the typical failure scenario though, isn't it?
You might have a few drives with bad seals that fail early. But say the majority are sealed properly, but the helium still leaks out at x small rate over time, you might have almost all the drives suddenly fail at around the same time after say 6 years or whatever.
I don't think you can really know if they're truly reliable in that sense until you can take one that's been sitting one in a drawer for 10 years and plug it in.
A company like Backblaze is bound to find out before the consumers do but I have a feeling it'll be fine. Keeping gas inside of something isn't much harder than keeping it sealed enough that dust stays out.
A high-altitude environment might put more pressure on the seals than other places, that could shorten the life-span, but other than that it shouldn't be a huge deal. Most drives have a commercial life-span of no more than 4-6 years anyway. After that you're on borrowed time.
I've been wondered exactly this about the He filled drives. Should I just assume that after 4 years the drive is not going to work? 8 years? I am aware that in 4 years modern drives of the same cost will be at least 4x the capacity, but still it's my data on the drive that I care about.
This is misleading. Helium has a higher conductivity and heat capacity than nitrogen per unit mass, but since helium has such low density, the heat capacity and conductivity per unit volume at atmospheric pressure is less than nitrogen.
While it's true that helium will have a lower practical heat capacity due to lower density, it still has far higher thermal conductivity than air at the same pressure.
Why would the platters be generating heat? By definition there's no physical concact with the heads, ergo minimal friction. I thought the primary reason for using a monatomic gas was it would allow the heads to fly closer to the metal.
About ~3/4 of the power a drive uses goes into the spindle motor. Only a relatively small part of that is dissipated in the bearings (a quarter or so), all the rest is gas-platter friction, turbulences due to the heads etc.
Fun-fact aside: The head fly height in helium drives is ~1 nm.
Air shear can generate a significant amount of heat. Reducing drag will also reduce the power requirements for the motor and voice coil. HGST claim a 23% reduction in operating power over air-filled drives.
As I understand it, the crucial factor motivating helium-filled drives is the reduction of vibration due to internal turbulence. Reducing turbulence inside the drives allows platters and heads to be made thinner and packed more closely together without affecting reliability.
Related trivia: Technical SCUBA divers use a helium-oxygen breathing mixture for deep dives. In cold waters, most will carry a small cylinder of air or argon for inflating their drysuit, because a heliox-filled drysuit has significantly less insulative value.
One time the gas supplier accidentally gave us helium in the argon supply cylinder and so my dive buddy ended up with a drysuit inflation tank full of helium. He couldn't figure out why he was freezing his ass off during the dive until he returned to the dive shop and checked that tank with a helium analyzer.
I know a few folks who use these, their applications usually involve a lot of them, high in the atmosphere (in flight or remote observatories on mountaintops) where downlink is prohibitive. Any other realistic use cases? Undersea maybe? But space isn't at quite such a premium there.
These are fantastic for long-term storage of something that may need to be retrieved at a moments notice, but is rarely accessed.
One example from a previous job, these in a Ceph cluster as an origin for a CDN delivering video. We transcode the video to various bit rates and chunk em, then clients request them through CDN, there is not a lot of writing, with some reads when the asset is new, but once it is cached it just basically sits and is idle.
Larger capacity meant we could store more data in a single rack, our limitation for the origin wasn't CPU or bandwidth throughput, it was literally our storage that forced us to keep expanding.
The deployment I did was using 8TB drives. We used HGST with their Helium because of the lesser power draw, allowing us to install more servers in a rack without going over our maximum power draw and heat density for cooling.
Delimiter has $10/mo "bring your own drive" storage plans. Just buy this drive and ship it to them and share it with a VPS or cheap server there and you can store a lot of media for a cloud Plex server
When I looked [1], it reads as if they have an 8 TB cap on this service. That is, you can ship them any drive, up to 8 TB, to plug in and run at $10 per month. It sounds like their pricing is predicated on some kind of assumed Watt-hours per TB.
It would be interesting if someone offered a similar plan, but priced on energy consumed instead, giving incentive for managed storage arrays that power down drives that idle (for some algorithmically-derived definition of "idle", trading off against projected lifespan of the device due to increased power cycling), and other energy-saving practices. Slices up the granularity of cloud hosting even finer.
If I built my NAS today instead of three years ago, I'd be using them.
You really can't have too much local storage. Price is the only prohibitive factor, though rebuild times are starting to be too.
Another factor to consider is that uplink, if you wanted to store stuff in the cloud, is almost always terribly slow and often capped, plus the cost of storing multiple TBs on any given provider.
Rebuild times pretty much forced me out of Raid5 and into a Raid10 after I encountered a secondary drive failure during rebuild. Luckily it was my raid controller pre-flagging questionable sectors that weren't actually reallocated so I was able to recover but now I always like to keep 2 independent backups. I use SFP-10gb links between computers to move files and even that is painfully slow transferring terabytes. You're right, The cloud is in many ways less practical until the bandwidth is available to access it all.
Yeah, that is indeed scary, and something to be taken into account for disaster recovery planning. Off site backups of systems need to be designed so that the whole RAID array can be physically destroyed (fire, flood, theft, axe, etc) and recovery will not take a ridiculous amount of time.
The wise sysadmin mantra of "RAID is not a backup" must be kept in mind at all times.
That's basically why I've gone with raidz3, yeah. raid10 still allows a second drive failure to lose data, it's just less likely to be that specific paired drive.
> The only disk more heavily loaded than usual during a mirror vdev resilvering is the other disk in the vdev – which might sound bad, but remember that it’s only being heavily loaded with reads, whereas all of the remaining disks in a RAIDZ vdev are being much more heavily loaded with writes as well as reads during a resilver.
Why is that happening? There's no reason to perform a bunch of writes to those drives.
Yep - it's been a long time since raw storage costs were so high I couldn't justify and afford to use raid 0 [Edit: Raid 1 not 0 - as DuskStar corrected me...) for everything that matters. Whenever I buy drives these days, I buy three of them at a time (if possible, the same size drives from at least two different vendors or batches from the same vendor).
Raid 10 still leaves you vulnerable to a second drive failure. I'd personally jump from a multiple parity scheme to at least raid 5+1 without ever stopping at 1+0.
(Though probably something with checksums instead of naked redundancy.)
Using FreeNAS I have 4x4TB drives. 2 mirror sets that are mirrors of each other. Only 7.25TB total space but I am not to worried about a failure. Disk are cheap, loosing 16 years of family photos, not so much.
Today - I'd probably go with the slightly more proven and slightly less expensive 8TB drives - jumping onto the latest tech for storage isn't something I feel is in my best interest. Six months from now is probably the soonest I'd consider using these (assuming there are no horror stories in places like BackBlaze's blog in the interim...)
Particularly anywhere that $$$/rack per mo and $ per kWh matters. Such as a medium sized CDN or backup service (Backblaze, etc) that wants to cram as many 3.5" drives into a 4U chassis as possible. If one is colocating multiple racks of 4U sized servers as close as possible to a major IX point, the rack space and power costs go up considerably.
In the sense that they're devices with higher performance for sequential than random IO but relatively slow to rewrite the entire device relative to capacity, or specifically the use cases that parent is describing?
Back in the stone age disks were "fast" in random access but they couldn't hold as much as tape could. So to get the storage you had a mix of disk and tape. That started as you had a command you would send to the operator (another historical concept) which was the person who was responsible for 'operating' the computer. That would print a message on a hard copy terminal that would say "Load Tape XYZ on Drive 2" or something similar, the Operator would go to the shelf of tapes, pull out the one that was labeled XYZ and put it on drive 2, "mount it" (which would, on vacuum readers, suck in and tension the tape and read the first block (the label)) and make it available to the operator who could verify it was XYZ and then send pack (again on the console) "tape loaded." Then your batch program would start and you'd get the classic video of a computer with the tape being read in bits and bursts and often another tape being written in bits and bursts.
Computers got bigger (able to process more data) and tape libraries got bigger, one of Sun's big customers was Fingerhut (mail order catalog) in Minnesota and they had a room where there were lots of 'tape operators' and when a customer was on the phone and the operator said "let me bring up your account" it lit a sign in the tape room with the needed tape, someone would jump up and grab it and put it on the nearest tape drive and 'tag' (there was a clock showing time from request to mount) and the tape would identify itself and send the customer record to a disk so that they were "online" and the operator's screen would light up with all the customer details.
IBM and StorageTek made robotic libraries that did the same thing, but without the human in the loop.
Then in the early 2000's NetApp and other storage vendors started offering ATA disk drives (dense, cheap) as storage offerings and slowly, eventually the tape libraries were crushed because this dense storage was more cost effective.
Primary disk storage has now gone over to solid state disks. They have even better random access and generally and do reads at the limits of the interface they are attached to.
But there is still a market for dense, cheap, read mostly data stores. That which used to be tapes, then tape libraries, and now spinning rust in a bath of helium. A typical SATA drive can really only do about 110 "IOPS" per second, in a large part because the mechanics of moving things around is inviolate.
So the use for dense data stores behind a small i/o pipe is read mostly archival and reference data. Oil & Gas sonar dumps, credit card sales transactions, backups of data which is 'live' elsewhere, etc. The role that Tape used to play but no longer does very well.
Hum. I am thinking about ordering a new FreeNAS Mini (the 8 bay one). I was going to get the 6TB drives....
So right now I have 3 copies of everything.
1. Masters are on 4 bay FreeNAS mini (ZFS baby!)
2. Rsync to Thunderbolt drive on desktop (well laptop but...TB monitor with drive attached).
3. Everything goes up to Backblaze auto-magicly.
Well it is for my office / man cave. I have a 19" telco rack in the corner that has as switch, the mini and a box running proxmox running VMs (minecraft server, work testing VMs, etc.). My startup buys from IxSystems for all or our stuff so I hope to get a good discount. My current 4 bay I have had for 4 years and it just works. Also the support from IxSystems has been great. I have spent so much of my life build data centers and networks, et.al. that just paying for something that works has a great deal of appeal. In 1998-2001 the rack has a ton of systems, switches and routers, now I guess I would just trade $ to spend time with the family. If I am going to build anything it will be a new Ryzen gaming setup.
Tangential question: What's the absolute cheapest storage per byte right now for someone who doesn't care about speed? Is it like a 4TB HDD? Any specific examples would be appreciated.
Thanks! Side question, but how is Amazon Glacier consdiered affordable? For 4TB it costs $192/year... yet you could buy two 4TB HDDs at that price yourself and use one for redundancy and the other for immediate retrieval. What's the benefit of putting your data on Glacier when it costs twice as much as keeping it yourself, and probably takes longer to retrieve too?
Like the rest of Amazon (excepting the Cloud Storage, which is all-you-can-eat for personal use only) - it's not affordable. Unless you might need 400 4TB drives, and keep track of which one have what on them, which are up-to-date, which are in which fire/disaster zones etc. At that point the price starts to look more reasonable.
But can't you put your own hard disks offsite somewhere? Just because you're holding your own backups that doesn't mean you have to be keeping them at the same location.
What are the downsides of Helium when it comes to data recovery? If it comes down to the click-of-death, surely a traditional clean-room set-up won't work.
There isn't. There is a robust market in Helium, and its price represents the cost of bothering to recover the large quantities that are produced (and generally vented) as a by product of oil extraction. We could have much more helium if we wanted to pay for it, but it's not worth it.
> In 2014, the US Department of Interior estimated that there are 1,169 billion cubic feet of helium reserves left on Earth. That’s enough for about 117 more years.
Proven oil reserves have been at more or less 30 years for the last 60 years. So, take the 117 with a grain of salt. Especially since a significant amount of that helium is sourced from oil.
Maybe in 100 years we will have fusion power plants everywhere producing Helium. :)
We will need something because fossil fuels are not sustainable. In 100 years we should be pumping much less oil than we do today. This also means finding substitutes for all of the byproducts of oil production.
Fusion plants in the 1GW range will have at most a couple of grams of fuel in them at any given time (though looks like ITER expects to use 250 KG per year collectively - still though, not a lot).
Additionally, a huge field of Helium was discovered this year in Tanzania [0], drastically increasing the amount of Helium we have available to extract.
I'd few of 8GB ones and they were truely horrible. They ran super slow and then to top it of they started making noises with very light usage within months. I'm just not sure how reliable these are...
Are you sure that you had Seagate EC drives, and not Seagate Archive drives? The latter are DM-SMR and therefore poorly suited to pretty much anything doing random write I/O (e.g. rsyncing directory trees, rdiff-backup and so on). However, with the correct workload (object store, log-oriented storage) they perform admirably, especially for their price.
With DM-SMR strange noises and paranormal drive activity has to be expected since the drive can spend considerable time flushing it's PMR buffer (~16 GiB) to the shingled zones.
If I'm reading this right the rated workload is 45 writes of the whole disk per year[1]. Is that right? That does not seem to be a high degree of confidence by them....
[1] I divided the rated workload listed on the page by the size of the disk.
I don't know about the Seagate, but the HGST He drives are super reliable. I've been using them since they exist (3 years) in growing numbers (several hundreds) and I'm still waiting for the first one to fail.
(fortunately, it'd only take 3-4 months to bittorrent down all those, ummm, linux ISOs...)