"If I want to recover a corrupted file, I find another copy."
So you've archived two or more copies of each file? That means you're use at least twice as much space (and if you're keeping the original as well, more than twice).
For the likely corruption of the occasional single bit flip here and there, you could do a lot better by using something like par2 and/or dvdisaster (depending on what media you're archiving to).
> So you've archived two or more copies of each file
You haven't?
It took me just one minor "data loss incident" ~20 years ago to very quickly convince me to become a lifetime member of the "backup all the things to a few different locations" club.
> That means you're use at least twice as much space (and if you're keeping the original as well, more than twice).
Storage is cheap indeed, though it takes some effort to make it cheap.
99% of the digital data I'm keeping for the long term is family photos and videos. All my photos go to Dropbox (easy copy-from-device and access anywhere) and are then backed up to multiple locations by CrashPlan.
It'll be a while yet, but in the next few years I'll be hitting the 1TB Dropbox limit. I'm hoping that Dropbox make a >1TB 'consumer' plan in the next couple of years. There's no way I'm assuming my backups are fine, deleting from Dropbox to make space, then finding out in a few years that some set of photos is missing.
I also sync up to Google Drive - but again, there's a 1TB limit (or a large cost).
In the future, I might have to create a new Dropbox account and keep the old one running. Storage might be cheap, but keeping it cheap is tricky.
Same here - I'm currently still migrating from CrashPlan B2C to using Arq backing up to Backblaze B2. (Being able to access B2 from Panic's Transmit Mac app made B2 really attractive to me as well, and it looks like I'll save a lot of money compared with CrashPlan.)
That's $4 per TB-month. Meaning you're effectively paying more than the cost of a 1TB hard drive replaced every year, for every TB you're storing. Plus fees to get your data back out. An 8TB drive, replaced every year, is half the cost per TB, with no additional access cost.
Depending on how price conscious you are, I agree with the GP's "keeping it cheap is tricky". And with things like backup, even if you do it yourself, the time spent maintaining it should be negligible: Occasionally kick off a format shift or failed drive replacement, have scripts running everything else.
> Meaning you're effectively paying more than the cost of a 1TB hard drive replaced every year, for every TB you're storing.
Yes. But what you get in return is not having that data at home. It doesn't matter how many copies you have locally if your home gets robbed, flooded, or burns down.
Glacier is good for dumping data into it but it's absolutely terrible for getting your data out and for full retrievals it's also very expensive. Don't rely on it for anything other than emergency backups of your backups.
No I don't, because it's a waste of space and money. Using par2 and/or dvdisaster I can archive a lot more files on to the same archival media and still get enough redundancy to feel secure.
"Storage is cheap."
Cheap is relative. Are you really buying an extra 2 TB's of storage to archive 1 TB of data, because that's what you'd need to do to archive 2 copies of each file. That's a huge waste of space and money that adds up when you're archiving a lot of data.
If your needs are small or your pockets deep, you can afford to do what you're proposing, but for the rest of us who aren't made out of money it's just not practical.
On the other hand, when I have worked at places which could afford to have multiple archives at various locations, I've made sure each of those archives were protected with par2 or dvdisaster, so I could recover from both rather than have one of the archives fail because of a bit flip error.
> Are you really buying an extra 2 TB's of storage to archive 1 TB of data ...
$ sudo zpool get size zdata
NAME PROPERTY VALUE SOURCE
zdata size 21.8T -
Yep.
It's fine that you "feel secure" with your current backup regimen -- and I certainly hope you never lose any important data.
After losing data once, though, I promised myself I'd do my best to make sure that it never happened again. The "primary copy" of all my data lives on the individual machines (my workstation, primarily, but there's a bit on my main laptop too) but there's also a copy of it all on a server out in the garage as well as yet another server (see above) that I have in an ISP's facility nearby. There's yet another copy of a small fraction of my files (the "really, really, really important stuff") that's sitting in AWS (via tarsnap) as well.
Some folks are satisfied with a copy of their family photos copied onto a flash drive and tossed into a drawer or an external USB drive permanently sitting on the desk next to their computer. I know of several small companies in my area that thought they were safe with an external USB drive connected to their server... until they got hit with ransomware.
My laptop has a pair of mirrored SSDs, my workstation has a pair of mirrored SSDs and a pair of mirrored "spinners". The server in the garage (my "first backup") has RAID10. That box at the ISP has mirrored SSDs plus a "raidz2" that the backups live on. Some of us just want a little bit more reassurance than others. :-)
> Some of us just want a little bit more reassurance than others
Which gets back to the original point. If you use format that can be more easily recovered, then having the same amount of copies, you're data is more secure.
You've probably been downvoted because it's perceived as showing off, but it is a nice setup.
I've also spend more time than I'm willing to admit with planning researching configuring and maintaining different backup strategies, and just wanted to say that I regret some of that. It's easy to become data hoarder and it's easy to spend more time on preserving it than it is actually worth. I mean, think about how much of this data is worth to people other than you, i.e. what happens to it when you die. Life's short and there are so many things that are more exciting than backups.
Don't get me wrong though. Backups are important.Just know how much exactly are they important to you.
I use PAR2, even with multiple copies at different sites, because I look at my photos so rarely that I wouldn't notice a master file had become corrupt before it had mirrored to the other places and the original versions expired (1 year).
5% parity archives is an easy sell, on top of 200% for off site copies.
What if he reads his own comment? That would be covered by the admonition to read literally anything.
But my troll aside, I agree. If losing the data would cause you harm or make you sad (losing photos of your kids for example), you definitely need to have multiple backups in multiple locations, ideally controlled by different parties (so one bug on your cloud provider's side doesn't wipe out both of the copies they store for you). I've been burned by this with personal data a few times. The stakes get even higher when you are responsible for someone else's data. If they don't want to pay for the extra storage, make sure they understand the risk involved.
If you're using par2, I'd say that's closer to recovering a second copy than trying to extract meaningful data from a corrupted file. (The internal structure of the format is irrelevant, thus concerns about it are outdated.)
I would generally suggest you're more likely to corrupt/lose your whole backup than to have one corrupted bitflip not addressed by the filesystem or underlying storage.
So you've archived two or more copies of each file? That means you're use at least twice as much space (and if you're keeping the original as well, more than twice).
For the likely corruption of the occasional single bit flip here and there, you could do a lot better by using something like par2 and/or dvdisaster (depending on what media you're archiving to).