I'm rethinking my backup workflow and I'm curious about other people setups - both hardware and software. How does your backup setup looks like guys and girls?
1. First, I run it locally on my desktop against a repository I keep on the same drive (/home/backup); then
2. I update, with rsync, a copy of this repository I keep on a dedicated server with full disk encryption; and, finally,
3. I spin up an EC2 instance, mount an S3 bucket with s3ql, and rsync the copy from the previous step up to the one I keep on this bucket.
This process is (embarrassingly) manual.
The backup repository is encrypted with Borg itself, and if I am in need of recovering something I do it from the local copy. I never mount the repository in remote locations.
I'm also using borg, to a server I control and also to rsync.net.
Essential and small files like keys (which may be necessary to "bootstrap" the backup in case of complete failure of my workstation) are copied and verified manually to offline storage.
I also want to have a Borg setup[1] on my personal VPS and backup the "most essential" data (which will be dome docs and some photos - not all) to it along with everything being backed up by CrashPlan. Will explore it someday.
[1] Are there comparable tools which are open source and easier to use (preferably with a GUI or so)?
I basicly do the same, except I sync to google drive using rclone.
I also sync to a USB drive about once a week.
It basically is a backup of my home dir. the rest of my data already lives in google drive (pictures etc)
To each of you guys having those extensive backup solutions (like NAS + cloud sync, second nas, etc)...
.. do you actually TEST those backups?
This questions comes from my experience as a system engineeer who found a critical bug in our MySQL backup solution that prevented them from restoring (inconsistent filesystem).
Also, a friend of mine learned the hard way that his Backblaze backup was unrestorable.
Very true. I overheard a similar conversation last week at work: "We have set up the backup procedure for our new production databases." - "Have you tested restore?" - "Well, uhm..." - sound of JIRA ticket being opened
By the way, I misread your username and, for a second, thought you were sytse.
Excellent point, and that begs another question: how do you actually test your backups? Of course, each case is specific, but is there a "best practice" checklist, or some general points to check for basic restoration?
Excellent question. I do test my backups and restores on a rather constant basis. Each environment within my infrastructure takes a bit of a different approach.
Application
This is by far the easiest for me to test. We have a CI/CD jon which literally makes a new environment, from scratch, and deploys our application to it in a production configuration. It runs a test suite which tests functionality across the application. Finally, it destroys the environment. It reports on each portion of the process. In this way we know exactly how long it would take to redeploy the entire application from scratch on a new infrastructure and get it up and running. This morning it took about about 6 minutes total before tests ran.
Database
We are running an RDBMS. We use a combination of daily full backup, incremental transaction log like backup, and point in time backup. Again, in our CI/CD when a full backup is taken it is pulled, loaded, and a test routine is run against it to check integrity. At this time, the recovery from the day before is destroyed. When a transaction log backup is made, CI/CD picks up this change and applies it to the full backup restore and runs a set of tests for integrity check. This leaves us with a warm standby ready to be switched over to in case of the main database server going down. We have never had to use the warm standby in an emergency but we have a test to make sure we can cut that over as well.
For point in time backup testing this goes back to our application test above. The application test will spin up with a point in time recovery of the database backup. It will test the integrity of that recovery and then test the application against it. Finally, it will swap from the point in time recovered database to the warm backup. It runs the test suite against that for integrity as well.
File Store
People often forget this but those buckets that get hold all of your file storage in the cloud can be destroyed so easily (sad, sad experience taught me this). We test those as well. I am sure you can guess at this point how we do that? CI/CD. It's a rather simple process with a ton of gain.
A few notes
People always ask me this, so I will answer it first. Yes this costs money. It's not as bad as running a second production environment. But it will cost you a bit. My follow up question is, how much does downtime cost you?
My CI/CD is always Gitlab CI at this point. I've used Jenkins. I've used Travis. I like Gitlab CI. You can do all of this with any of those.
We script literally everything. Computers are so good at repetitive tasks. Why would you EVER do anything manually? Really. If it has to do with your infrastructure, script it.
If anyone has any questions about these ideas, feel free to reach out.
We currently have 4 full time devs, a QA, a DBA consultant, and a Designer on the team.
Honestly, none of that took very long to set up at all. The application in this case is a Ruby on Rails backend, PostgreSQL database, Angular front end, with file storage and a few other smaller services.
Step one: We have a lot of tests and we believe in a good test suite. Are we perfect? Absolutely not. But it is important to be able to "know" the application works. Define what helps us to know it works, and automate tests to do that. Things like "Can you log in?", "Can you select a record of type X, Y, Z, A, B, and C and do those records have the data you would expect in the right places?" You can have a human do this, or you can automate it. Automate it.
Step two: Automate your deployment. The rails application is bundled into a docker container. We use ECS (Elastic Container Service) to maintain our environments. CI/CD first runs tests, second, builds the latest docker container, third, places the docker container into a repo, fourth , deploy out the container to the correct ECS environment, five, profit! This is all automated and works the same every time with checks and balances along the way. Our Angular application runs out of S3 buckets with cloudfront caching. This was a matter of using webpack to compile the angular application down to production deployable artifacts and than a simple bash script to move those artifacts to the S3 bucket. The database is an RDS instance so we get some fun things built in there. Note: All of the AWS setup is also automated with scripts. Create VPC, create autoscaling group, create targets, create rds instances, create s3 buckets, create cloudfront, and delete all of the above (and more, aws is complex), are all just scripts.
Step three: Because we have a test suite and deployment scripts the rest of the process is easy. Just use the scripts to create whatever environment we need, stick it on a schedule, record the results in CI/CD, alert the WHOLE FREAKING WORLD if something doesn't work.
Now you might say, easy to say in a Rails environment, with so few moving parts, with such a new project, etc etc etc (I have heard every excuse in the book). I have done this for many other companies. The last I did it at had about 50 engineers, ran a large Java mixed bag of applications on Tomcat servers, ran Oracle for their data, and had no tests and a ton of legacy code. We got to the same point as I have already explained by simply breaking it into chunks. First, automate the tests that were done manually. Second, automate the deployment steps that were done manually. Third, automate the environment things that were done manually. Finally, schedule everything and monitor.
I learned to do this at HP Labs where we used the same process with a very large API fronting a C based image processing system with Petabytes of storage, thousands of servers, and a huge number of moving parts. I promise, it can work anywhere.
I've been thinking about this as well - it seems like it would fit in nicely with other CI jobs. With database backups, for example, you should be able to script the restore procedure and apply some assertions to check it worked. Bonus with this is that you now have a script when you actually need to restore.
I had a company that I was doing some work for come to me to ask for a copy of the database. Their backs were corrupt, and it was not until they tried to restore it did they find out. But they have 5 years of bad backups
I'm a big fan of setting up testing and dev environments from the production backups.
For personal backup of files, I just verify the results are in place. I've checked them once or twice, but honestly, I'm more concerned about my scripts stopping running than they running and not being correct.
I'm using CrashPlan and I have recovered multiple files over past couple of years, that I either mistakenly deleted or overwritten. I haven't tried any full-scale restore, yet, though.
CrashPlan lost some data of mine in 2013 from querying a corrupted Volume Shadow Copy Services database on Windows. (At least, that was their explanation. I'm surprised that their client did not independently verify the data after it was uploaded.)
I moved off CrashPlan in 2016 because their upload speed continues to be embarrassingly slow outside the US even with deduplication and compression turned off (they have a datacentre where I'm at, but it's for Enterprise customers only).
They also highly recommend having 1GB of RAM for every 1TB backed up, which sounded a bit unreasonable to me.
I moved to Acronis True Image when they offered unlimited cloud backups with their 2016 version. They probably couldn't sustain it, because I had to pay a lot more for backups when I wanted to renew in 2017.
Now, I'm using both Arq and Synology's Hyper Backup with Amazon Cloud Drive. One of the problems I foresee is that while Amazon doesn't care how much data one stores in Cloud Drive, they have suspended users for downloading past an arbitrary limit in a certain period of time — so full restores might not be possible.
I have used time machine repeatedly to restore lost or damaged files. I also replaced harddrives several times and played back my carbon copy clone. It boots and I have never missed a file in years.
I'm fortunate to only depend on a single platform, Linux in my case, so I rent a 1TB vps[0] to whom I rsync[1] every day . Then depending on the criticality of the service I'm backing up I create weekly/daily/monthly snapshots (rdiff-backup). I encrypt sensitive data using symmetric aes 256 (gpg).
I thought I was getting a great deal, I'm paying $11.50/mo for 200 GB (at RamNode) but 1TB for €3.49/mo is insane. Are there any US-based hosts that offer these kinds of prices?
I use vcspull (http://vcspull.git-pull.com), an app I wrote, to re-clone my projects to the same familiar directories.
Keep books on Kindle.
Have your own personal self-hosted git with gogs (https://github.com/gogits/gogs). You'll need at least the 10/hr DigitalOcean account though, or else compiling will fail due to memory limitations.
I use digital ocean extensively to host old old projects on the cheap. Its like what dreamhost used to be in 2006.
- 7TB ZFS pool running on Ubuntu Xenial
- hardware: an old laptop with 6 cobbled together external USB 3.0 drives making up the pool
- each vdev (3 total) in the pool is mirrored
- standard tools on top of that: time machine + netatalk, NFS, samba, SSH+rsync, ZFS send/recv, etc.
- external drives need battery backup (can't recommend the case where you don't have battery backup for ZFS vdevs)
-- no ECC RAM
Off-site backup:
--
- Ubuntu Xenial running in google cloud, 10GB root volume, 3.75G RAM, 1 vCPU
- secondary (backup) disk is currently only 1TB, with a zpool as filesystem. Easily expandable by adding virtual disks as vdevs (plus I trust their hardware slightly more than my own).
- using ZFS send/recv to selectively backup important datasets and keep cost as low as possible
Secondary LAN Backup and Restoration Testing:
--
- a separate 8TB disk on another ghetto piece of old x86 hardware, no redundancy
- restored from the offsite backup to get 2-for-1: backup and restoration testing
Encryption:
--
- everything local uses dm-crypt
- as for google cloud, I also use dm-crypt. If I want to conceal the keys from remote memory, I use nbd-server to expose a block device across SSH
I believe it's about .026/GB-month so about $20-30/TB-month. My ultimate goal is to get some scripts going to push the least-accessed datasets into nearline which save me about, say, 30% on total cost.
I consider the total price I pay for cloud backups as a form of punishment for how poorly I manage my data ;) ZFS will let you go crazy and expand forever transparently but you should actually know what data you really need to back up...
Fun story. I ran "rm -rf ~" by mistake just the other day. A misconfigured program had created a dir named ~ inside of my home folder and I was a bit quick to type the command. No harm done because I had setup a cron to rsync everything daily as late as last weekend. Upgraded my backup solution to rsnapshot, still looking out for even better solutions. Phew!
- 2x local connected backups, two identical copies usually, one on an internal HD separate partition, one on a home NAS. Usually once a month or so, more often if I am doing something specific
- 3x rotating external backups in a bank safety deposit box, every 3-4 months or so will rotate one of the backup sets there
all disks are encrypted of course.
I am surprised a lot of people pay per month to backup online when a safety deposit box is usually way cheaper, and you can't beat the transfer speed. A standard bank safety deposit box seems to fit 3 3.5" hdd perfectly, or 6-7 2.5" hdds, and that's a lot of TB for not a lot of money.
I always rsync --checksum a second time after backing up, and am starting to think about writing a py script or something to calculate checksums and save them on the disks so I can check them at any time, but this said with the implicit redundancy above of having 2x nearline + 3x offsite it should be fine I would think.
[I] app-backup/flexbackup
Available versions: 1.2.1-r12 ~1.2.1-r13
Installed versions: 1.2.1-r12(05:37:21 PM 03/03/2014)
Homepage: http://flexbackup.sourceforge.net/
Description: Flexible backup script using perl
Pretty old, but some software is like that, able to be finished.
I run a couple cronjobs on it, doing full backups every Sunday and differentials against the full backup every other day of the week. The backup target is a RAID1 backed disk on a NAS.
Flexbackup produces tarballs essentially, with indexes and the usual add/remove tracking. Compression can be naturally applied. It all relies on the common Unix tools. Just yesterday I updated my 4-year-old configuration to try out new partitions and incremental backups; A minimal example config for flexbackup:
I have two USB-connected hard-drives which are switched every week and one is moved to another location.
The drives are encrypted with LUKS/dm-crypt. Encryption key is a file with random data stored in the /root dir, so the disk encryption is not safe from local attacks. Key is also stored off-site (not in the same location as the off-site disk of course.)
A cron-script runs rsnapshot daily, which collects data from the local host and from remote hosts.
Remote host backup goes via ssh, using a passwordless ssh-key, with a forced command in authorized_keys which is only allowed to run rsync. The script below must be modified so the rsync command match the actual command which rsnapshot executes. Also note that the path names can only contain [/a-zA-Z0-9]. It's a bit restrictive I know, but I tried to lock it down as much as possible. Just edit the regex if needed.
(1) Backup whole disk to time machine automatically
(2) Backups every week or so to an external hard drive
(3) Daily backups to google nearline via arq
(4) Manual backups of important documents to tarsnap
Professional (code): mostly taken care of by client infrastructure as I'm a freelance developer, but I basically rely on cloud source control (BitBucket, GitHub, etc.)
Personal (photos, etc.): Don't really trust the cloud, so I have a QNAP NAS in RAID1 configuration with two 3TB WD red drives. We upload photos/videos from our phones (and also store important docs/files) here. I replicate this drive every 4-6 months and store it in a safe deposit box at our bank (in case of fire). Not perfect, but I think good enough. Haven't "tested" it, but since family photos and videos are the most important part of it, there isn't really much to test (we view pics off of the NAS regularly).
I have my "home" as a master Syncthing folder, that I sync with a RPI3: https://syncthing.net/
I have it set up to keep a few revisions of every file.
Syncthing is not really meant for backup, but I really like that it just happens over the network, in the background, without further intervention. I am clumsy, lazy and not disciplined enough for other backup setups that require action (e.g. connecting the computer to something sometimes, manually triggering backups, etc...)
For my personal devices, I also use Syncthing.. but I do not keep revisions. Instead, I have push only from Laptop (etc) and on a server running ZFS I take automatic hourly, daily, weekly and monthly snapshots which are auto-purged set on my retention strategy. It is much more efficient on disk space.
Occasionally, I will run rsync across the device by hand just to check consistency - but so far it has been reliable.
I wonder how the various online backup services would handle a request to delete the backup.
In a ransom-ware situation, the bad guys might have the keys to a backup service. The existence of that backup would make it pointless to actually pay the random so they have a motivation to do what they can to delete the backups.
I would have no problem opting-in to a provision where they will not delete backups for (say) a year, no matter what.
"Delete my backups now! I pay you and I demand you obey my request!"
"No."
The correct solution is having an "append only" and a "full control" key for the same account. Keep your full control key on paper in your safe/drawer, almost never use it on your computer. Use your "append only" key almost always, if it gets stolen by ransomware, they can't prune your online backups.
- An external for music, pictures, and ROMS, another external for video
- A backup of each external for travelling
- A third backup of both externals onto a single larger external
Off-site:
- Crashplan
- Google Drive for source code and important documents
All my source code is on multiple laptops and kept backed up through Github. I should probably start including the GitHub folder on my CrashPlan as well, just in case my repo ever gets deleted or something.
My desktop Linux machine (which is always running) doubles as the backup server, backing up itself, a couple of RaspberryPi’s (one Kodi, one home automation) and my SO’s windows machine.
Backup using rsnapshot to an external USB drive that is LUKS/dm-crypt encrypted. Every Wednesday the SO swaps the USB drive with a sibling kept at her office.
I really like the way rsnapshot works with complete looking images for each backup, but unchanged files are hard-linked across images. Makes it super easy to just mount the drive and grab the backup of that file I just corrupted.
For the windows machine, I’m using Cygwin to run an SSH server and rsync. Before running rsnapshot, the backup script sends a wake-on-lan to the PC, then SSH in to run a batch file that sets the go-to-sleep timeout to never, and make a shadow copy of the drive which is what goes to the backup.
Then rsnapshot does its rsync over ssh thing to do all the backups.
Afterwards, SSH again to run a batch file that cleans up the shadow copy and resets the go-to-sleep timeout back to twenty minutes.
§
Unfortunately I’ve got some sort of weird problem where it dies while doing the backup of the root folder on the local drive. I’ve run spinrite on the drive, and blown the dust out of the machine, but no change. Last time I had this problem the power supply was failing under demand, but I’ve stress tested it and that doesn’t seem to be the cause this time… sigh. Bit hard to gather diagnostics as the machine is completely locked up when I come in the next morning…
Simple setup. Two USB harddrives. One at home, one at work. The one at home is plugged in at all times doing hourly Time Machine backups. The one at work is disconnected and laying in a drawer. Encrypted with HFS.
Every other week I take the home drive to work and take the work drive home to swap duties. I never have the two disks at home, one is always at work disconnected from power.
This is my personal balance point between comfort, no cloud and a reliable backup.
Backups are tested by restoring to a new HDD every now and then.
For my desktop systems (About 3TB across 3 Systems)....
1. Cloud Replication - All files/docs are stored in one path under VIIVO (an encrypted folder utility) All encrypted files are replicated to Dropbox / OneDrive paths for cloud replication. Only encrypted data is replicated to the cloud.
2. Cloud Backup - Full system is encrypted and backed up to CrashPlan by Code42.
3. On-Premise - All user files and folders (encrypted and not encrypted) are backed-up with two different storage paths in a NAS based Apple TimeMachine (Mirrored Drives)
4. Local - Daily and frequent folders are also replicated with SyncMate to a local USB3 Flash Drive
5. Off-site - About once a year, I backup my data files, apps, and critical files to external USB drives and ship them off site to my parents for storage just in case. These drives are usually encrypted with VeraCrypt or just Apple Encryption.
6. Tax documents, personal documents, scans, and important personal files are copied periodically to a rugged USB flash drive and placed in the home fire safe.
For my Servers/Array 10TB (Includes my Apple TimeMachine Archives)
1. On-Premises External USB drives provide daily backups with local Synology Backup tools
2. On-Premises - RSync some data to external WD NAS
3. Cloud Backup - Cloud backup some paths with ElephantDrive
4. Off-site - Monthly backups with USB Drive rotation with drives sent to parents home to be stored in fire safe.
Not sure if you're talking about data or actual dev workflow but I will share my setup with you:
In terms of Data, everything I own is backed up in Google Drive. (Photos and Documents mostly, I don't take tons of pictures and ALL the music I listen to is on soundcloud)
In terms of dev workflow, it's pretty interesting. My macbook air died on me last week, and because I can't afford to get another one (or even a decent pc for that matter) I've fallen back to my raspberryPi. The browser is a little bit slow sometimes, but I have to say that I'm quite impressed by how well it performs.
Because it's a bit limited in terms of hardware capabilities, I've bought 2 VPSs from scaleway which I've provisioned using an Ansible playbook I wrote.
I was up and running and ready to work within minutes.
Now it's a bit inconvenient because I'm used to being mobile and taking my laptop with me everywhere, but it's a perfect backup solution for now. Obviously I don't watch netflix on it of play video games, but for 35 quid you can't really expect much.
Glad you brought this up. I use a NAS drive as a mapped network drive, that's cloud synced with one drive for business, and I also have that NAS doing hyper backup to both Google Drive and a local plugged in external HD.
There was a sync problem that I had to address but before that I went to check to see if I could download the backup from Google Drive (this is very slow for larger backups) and open with hyper explorer to restore all the files at least to my computer so I could provide end users with what they need.
Once the .zip file completed and the many parts downloaded and extracted I went to open the backup file with hyper explorer. Everything looked good but of course I need to test a true restore so I want to see if I could save a pdf and open it.
"partial file restored" - guess what it couldn't open.
That sent me into a panic. Now nothing was lost or down because the cloud sync was the only thing having issues so everyone could still work and properly access the NAS but now I'm thinking "great my backup wasn't a backup because it's useless."
I'm currently in the midst of trying to figure out what to do now, the external works but I wanted the offsite hyper backup to be my savior in case of a flood/fire or external HD failure.
Re: Crashplan: I recently learned that Crashplan will fail silently to backup most of your data over 1TB. The only fix is allocating it more RAM via a console command. None of this is made known up front, I didn't notice until I tried to restore files that weren't there.
Re: Arq: It used to have issues with missing files. Has anyone restored from an Arq/Google backup recently that can speak to the reliability?
More seriously, on a personal level, I run Deja Dup on my Mint laptop to a USB disk that's LUKS-encrypted. Of course, that's not enough, so I have a script running on my home DHCP server - when a DHCP lease is granted, if the MAC matches my laptop's ethernet adapter, it runs rsync over SSH to my user folder (on a RAID1) on the server, doing a one-way sync. From there, I have an LTO3 tape drive that I got cheap on eBay, and I dump the folder to tape with tar weekly (cycling through the tapes, of course).
Anything irreplaceable, I keep in Dropbox, mirrored to every machine I have access to. If I were to manually delete the folder by accident, I've got 7 days to restore it on the Free tier. And if Dropbox itself does a GitLab, chances are very high that I have one of my machines with a recent sync powered off, so booting that up without network will get me a reasonably up-to-date folder.
It's a lot of moving parts, but everything is usually nicely in sync.
I recently reinstalled my Mint laptop and restored from the Deja Dup backup, so I'm reasonably confident it would work in a DR scenario.
I used to have a subscription to crashplan but that wasn't flexible (or cheap) enough when you try to backup multiple machines/phones.
Now I have a raspberry pi with an encrypted USB drive attached where I sync all files from laptops/desktops/phones/truecrypt-drives (I have an instance of pydio-cloud running too).
Then, once a week (or once a day depending on the folder) I sync everything to rsync.net.
Mac: Carbon Copy Cloner and Time Machine on separate usb disks.
I use the system scheduler to wake the machine at night, mount the disks, start both backups, unmount and sleep the Macbook again. Rock solid, runs every night since years. Even swapping the harddrive is a matter of 30 minutes to play back the latest ccc clone.
I have to find a similar backup solution now also for my Linux based Thinkpad. I am looking into Mondo rescue, because it promises to create a bootable image on an external drive (just like Carbon Copy cloner).
For me, it still fails, but this is Linux. Needs more time and research.
This is a personal backup of one computer only. I have bad experiences with centralised backup solutions. In every case you need to reinstall the operating system at least before you can access the backup. I also forgot my password once, because access to the backup is not frequently needed and well meaning admins constructed crazy pw rules. So even though I had a backup, it was not accessible any more.
I would suggest always having one of the three disks in a seperate location. Never have the three disks physically close together, not even when swapping them out.
Also, have an offline backup that is not connected to power. Power surge at night and all your data is gone.
Well, yes and no.
People who take their backup drives away from the computer tend to have - a completely outdated backup.
There is no 100% safety. If power fails while the backup is done, well there is still the original disk. How high is the possibility that it crashes in the same moment?
It talking about a power spike. If lightning strikes you could toast your computer and the connected backup drives. That is why offline storage is important.
At home I use Dropbox for some files and Resilio Sync for others
At work we make heavy use of version controlled configuration management where we can recreate any machine from just rerunning the ansible playbook and duply backup for databases and other storage.
While duply was trivial to set up, nice to work with, and much more stable than any other solutions that we were using previously if I were to do it again with more than a handful of machines I would have likely looked into reversing the flow with a pull based backup just to have a better overview since I don't trust a `duply verify` monitoring to catch all possible issues.
Cloud backup is managed by a server fetching the data and then backing it up with duply.
We also run a rsync of all disk image snapshots from one DC to another (and vice versa) but that is more of a disaster recovery in case our regular backups fail or were not properly set up for vital data since it would take more effort to recover from those backups
I purchased a large safe that has ethernet and power pass-thru. Stuck a NAS with RAID 5 inside and use it as a Time Machine target for all of our laptops.
Additionally everything in the house runs BackBlaze for offsite backups.
Once a year I restore a machine from backups to test (usually I'll copy the drive to an external first just in case).
There's not really a lot of detail in your question so I've no idea what sort of solution(s) you're interested in. One suggestion, however - if it's a linux/unix based box you're backing up, you're looking for a hosted solution and you care about security, tarsnap is excellent.
CrashPlan. Backs up my personal laptop's user directory. I've excluded few directories like ~/Library. The speeds are really bad and their Java app sometimes makes me bang my head against the wall and almost always sets my laptop literally on fire. Thought of moving to BackBlaze many times but their version retention policy just doesn't click for me.
Out of this backed up data some are kept in my Dropbox folder (out of which some personal/crucial data are encrypted). And everything that goes into CrashPlan is encrypted on my machine. And yes, I've restored from CrashPlan and once in a while I keep testing some random folders and files to see whether they are actually up there in the cloud or not. I guess I should do a full restore some day (but given their speed and my geographic location it may take weeks or months).
I use SuperDuper! to clone my laptop's user folder on a 256GB portable hard disk (my laptop has 128GB) every 2-3 months or so and have instructed it to not remove the deleted stuff but add added stuff. I also copy my docs, pics, videos, and music to my 2TB portable hard disk regularly (and I keep testing it).
(edit: I've recently disabled auto-upload to Google Photos. Now I do it only for those photos that I want to upload form its Mac uploader app)
Work:
Code goes to our own gitlab setup, rest of the stuff to my company's Google Drive. Sadly we don't have some holistic backup setup at work. It's a startup. I had dropped an email to both IT and Engineering heads. They replied saying it was an interesting idea and they would look into it, I knew they wouldn't.
Going forward, I want to have my own BorgBackup or something like this (client side encryption, de-duplicated, compressed, kind of fire and forget solution) solution hosted on a small/mid sized VPS in place or CrashPlan/BackBlaze or along with these readymade cloud solutions. Something with a GUI would have been nice though. Something lightweight, minimal, but solid (BackBlaze's interface is awesome).
Whatever you end up going with you have to actually regularly restore the data and simulate a disaster recovery. Whilst it makes sense to have automatic checks in place, IMO its always worth manually doing the recovery. Prove it all works, it sets expectations and shows issues.
Personal back up - way more complicated than it needs to be!
(1) Chronosync and ArqBackup are installed on a Server. (2) Each client machine has Chronosync Agent installed. (3) The Agent backups specific files and folders(according to a schedule) to the Server. (4) Chronosync on the Server will back up to a second hard drive on the same server. (5) ArqBackup will then backup the files on this second hard drive to Amazon AWS (in encrypted form).
Separately, I have independent Time Machine back ups on external hard drives as well. Some of the core client machines also have backup's occurring to SpiderOak.
I have done minimal restore tests but part of the reason why I back up in the way I do is because I expect one or more of the backup's to fail when I need to restore.
Cloud + https://github.com/duplicati/duplicati
Encrypts and compresses before sending data to the cloud and lets me restore files from specific days if needed.
One honking big Supermicro SC836 chassis with a Supermicro low power board in it.
Stuck FreeNAS on it and backup everything to it using nightly rsync and ZFS replication where possible. It has 48TB of storage (16x 3TB).
Critical bits get synced to Amazon Cloud Drive (which took an age).
For backing up my ESXi VMware box I use Nakivo - it's an amazing piece of software - never once had an issue with it and I've used it many times to revert a broken virtual machine.
I've had a lot of experience with hardware failing in my IT life. Been close but very lucky that I've never lost data from a disk or corruption failure. Finally bit the bullet and bought all that kit just for backups. Well worth it.
- Full computer in Crashplan with user-specified key.
- Non-sensitive documents I care about on Dropbox.
- Code I care about (have time invested in) with git remotes on Github, Bitbucket, or a personal Gitlab server.
- For "continuity" I carry personal property insurance that can replace my laptop.
I don't bother with external drives or NAS devices because the scenarios I feel are most likely are burglary followed by natural disaster; I don't want to rely on something in my home to protect something else in my home.
After hard drive crashes I am usually grateful for a clean slate, and at most pluck one or two files out of backup when the need arises.
Nas and lots of custom scripts and programs completely switched off and unused. Too much hassle for something that should just work.
Instead: Cloud backup to CrashPlan for pennies for 10 machines. Already saved my butt several times.
So my only tip - don't do anything yourself. Doing your own backup is like writing your own crypto. It will bite you.
A reasonable compromise is to use your own backup in addition to a service. However, use them independently - don't back up your backups to the cloud, backup your machines to both places. Otherwise your custom setup can still be the weakest link.
I have a Mac Mini, my main work machine and an MBP, used when I travel. I backup both my Mini and my MBP to Lacie Thunderbolt SSD drives using Carbon Copy Cloner which kicks off the backup procedure every night.
I also backup my photos, documents to both iCloud and DropBox.
I don't use iTunes on my MBP, my Mini is connected to another external SSD which serves as my iTunes disk that is also backed up.
Whenever I travel, I just sync my MBP to be update with my Mac Mini.
I am also looking at using iDrive backup [1], but have not done so.
I have a large (24TB) RAID6 at home and backup all my files there. It's large so I have room for all my DVDs, BluRays, and developer VMs. I have a smaller (6TB) RAID1 in another state at my parents house for off-site backup of important files. Both are running mdadm and set up to email me with any events. I have a cron job that runs rsync once a week and emails me the result. Both systems are on an UPS. I have tested to make sure they are working as expected. All my systems are running Linux, so I can access with sshfs or sftp using ssh keys.
I want total control, so a Synology NAS box setup with two disks in a mirror, 1 SSD as cache, and one hot failover. All laptops backup to it. It backs up to amazon s3 and to a second synology NAS.
I use CrashPlan (as does my family) and I keep an archive encryption key so it's encrypted on my side. I've found this fantastic (for example when my sister's laptop died as we needed to retrieve her dissertation). It's quite cheap and has unlimited storage. I don't back up everything on here, only the important stuff.
I also have a Time Machine drive that sits on my desk for quick access / just to have another option (although it is not encrypted so I might wipe it and find a way to use TM with encryption).
Most random docs, todo lists, invoice scans, etc. are in Dropbox or Google Docs.
Home pics, music, videos: CrashPlan central. I also set up a local CrashPlan archive to a local NAS, but OS X can't keep the IP address resolved.
Work: all projects are in source control. Configs and working trees are backed up a few times per day to JungleDisk. JungleDisk has saved me several times when I accidentally deleted a folder or overwrote a file before checking it in. It's also handy for using the restore function to copy configs to a clean dev machine.
Wow. No one using Bacula? It seems a little cumbersome to get set up, but once I did it, I can more or less forget about it.
I run a mixed bag of Linux, OSX, Windows machines and each one of them gets incrementally backed up each night, and a full baseline once a month to a machine on the home network. Nothing fancy.
Then about once a month or when I think about it I copy the backups to an external drive and take it off site.
Worst case loss is a month. Seems cool to me.
And yes, I quite often ensure I can restore files - usually by stupidly deleting them at the source.
Interesting how you broadcast on your blog that "it’s extremely unlikely that you’ll ever need a backup".
Have you ever had to look a parent in the eye and say, "I'm sorry all your photos of your kids growing up are gone" ?
Backup now, backup always, backup often. You can't buy that stuff back.
For the past few years, I have been using a mix of rsync against an in-house and an external server + encrypted USB drives[0]. The key to encrypt the external drives is using a simple algorithm based on the serial number of the drive and a very long string stored in a Yubikey.
I separate everything in years. Current year gets synced every week via rsync with a drobo (drobo duplicates the data amongst all the drives).
I also have a disc in another location that gets synced once a year at christmas with the archive.
It was a bit of an investment but then it's pretty cheap to run.
I know it's not perfect. If I delete something without noticing and sync it afterwards it will be lost forever, but I'm running this for the last 10 years and never really had a problem.
I've been thinking of playing with bup [1][2] for personal stuff, so I was hoping I'd see that someone here had played with it. I don't see any mention of it yet, so if anyone has used it and could share any thoughts, I'd love to hear them!
In the external case, it's just a Nextcloud instance, constantly syncing the most important files.
In the local one, there's an external hard drive connected to Raspberry and cronjobs that scp into it.
So, three constant copies of everything out of any importance. I "test" the backups regularly because I'm playing files from a backup on a Raspberry connected to my sound system and constantly downloading files on my phone from Nextcloud.
Do you version your files or does it just overwrite the existing ones?
What happens if you do accidentally overwrite your local copy with zeroes, does the sync process sync the now broken file to Nextcloud making recovery impossible?
Not sure if Arq has this, since I'm only running the trial version at the moment, but CCC actually clones the drive to the destination disk and I should be able to boot from the backup drive in case of emergency.
I run linux for work and windows for gaming between a laptop and desktop. For the files I use frequently I unison those between machines. For backups I send everything to an encrypted removable HD on my home network using some rsync scripts I wrote. For the cloud you can't trust any of them with your data privacy. But I still send off some stuff to amazon cloud drive (encrypted of course) using rclone.
1TB backup drive which is partitioned. Half is data and the other half is time machine. The data partition is mirrored to Google Drive. Then, I use Arq Backup to mirror time machine backups to Google Cloud Storage. In other words, there is always a true local and remote copy of everything. Very cheap and works well.
For my personal dev machine I simply use github and dropbox. I'm sure there are more complete ways of storying my full system but I've actually never needed it...knock on wood.
That being said I can re-create my system from scratch in 3 hours so if I spend my more time than that on backup I think its a waste.
I used to use duplicity but recently switched over to borg. Not having to do full backups is nice, I can mount any backup at any time and delete intermediate backups in any order.
While a large part of my backup system currently consists of manually mirroring pictures to various solid state media (that occasionally are moved to separate fire zones) - and a good helping of prayer/good luck - the part which is set up uses duplicity driven by backupninja (from Debian repos - upstream is):
https://0xacab.org/riseuplabs/backupninja
It was complicated to set up separate signing and encryption keys such that the server sending backups could not decrypt (assuming the symmetric session key wasn't somehow kept around). But once setup the only worry was making sure the backup server didn't run out of space.
I use rsnapshot to aggregate a bunch of machines on my NAS.
I've been intending to for a few months O:-) to then save this aggregated backup somewhere on the internet. Not sure if e.g. tarsnap, or a minimal vserver with rsnapshot or rsync yet again.
Iv'e had one very large project back in the day that wanted source on CD and printed copies of the source code. I'd never thought of a printed page as a backup, but I guess it's 'a' method.
Apple Time Machine. It's pretty great in the "set it and forget it" world of backup solutions. All my actual code also lives on Github, so there's always a remote copy.
For people using rsync and the like - Does anyone have data on the amount of wear caused by reading the entire HDD (modulo OS/libs) over and over again to compare against the backup?
i use a simple nas + external hdd + rsync to backblaze.
the macbook uses the nas for timemachine backups to an external hdd, the external hdd is backed up with rclone to backblaze once an hour every hour unless its already executing a backup.
any iphone backups are rsynced to the nas/external hdd then rcloned aswell.
iphone photos are kept in icloud, including any added to the macbooks photos app.
about $0.75c per month for 200gb sofar for backblaze and
$1.49 p/m for 50gb of icloud
I use Backblaze also, for my virtual machine backups as it would be time-consuming to recreate them (I have several for various clients, server set-ups, etc). The seriously important stuff such as source code is in Bitbucket, and I also have a nightly AWS batch job that backs them up to S3.
I haven't seen anyone mention password managers yet. I use lastpass day-to-day but for my recovery codes I have a Keepass vault.
I have backups of that dotted around but I have also asked a couple of my friends to keep their own copies of this vault in the event I somehow lose all my computers and phone. I think that resolves the chicken-and-egg problem quite nicely.
For pictures, videos and stuff I have a 1TB drive in my desktop and an USB 1TB drive which I normally use in my laptop. From time to time I plug the USB drive into the desktop and sync them with Unison.
I built a small veneer on top of ZFS and rsync that I've been running for well over a decade. It has worked flawlessly, mostly because it is so simple.
I use it almost exclusively with Linux systems, but it should work with anything that rsync does a good job with.
The hardware is mostly commodity rackmount boxes with 8-12 drives running ZFS (zfs+fuse or ZFSonLinux more recently). Deduplication takes a shockingly large amount of RAM, so mostly I disable it.
The job of tummy-backup is to schedule backups, prune older backups, and complain if something is wrong. There is also a web interface for creating backups, manually running them and managing them, and exploring and recovering files (via a downloaded tar file).
BACKSTORY
I was running a small dedicated hosting business, and we had hundreds of machines to back up. We started off with the old hardlink + rsync trick, but it had two problems: Append-only files would cause huge growth (log files, ZODB), and managing creating and deleting the hard links would take tons of time.
We tried backuppc for a while and liked it, but it still had the problem with append only files growing, and lots of our backups were taking more than 24 hours to run.
So I took my old rsync+hardlink script, which had proven itself really robust, it lives on in this file:
I started using it on Nexenta when they had their free release. That was ok, but about once every month or two the boxes would fall over and have to be rebooted. I realized in retrospect this was probably due to not having enough RAM for deduplication or just not having enough RAM period. Or maybe bugs in ZFS.
But Nexenta wasn't something our staff had experience with. So I started testing it with fuse+zfs. This also had bugs, but the developers worked with me to find bugs, and I created stress tests that I would run, sometimes for months, to report problems to them. Eventually, this was pretty reliable.
Now I am running it with ZFSOnLinux, and that has been very stable.
I'd love to try it with HAMMER to get deduplication, but I just haven't had the cycles. btrfs was also on my radar but at the time I was really looking for an alternative to ZFS, btrfs had been getting more and more unusable (I ran it for a year on my laptop but then experienced several data corruptions every time I tried it for 2-3 years after that).
Recently I've been playing with borgbackup of my laptop. I was hoping I could use it as an engine to get deduplication, but it really is only designed for a single system use. For a single system it seems good.
I noticed that actual files on my laptops are less and less every year: I use iCloud Photos for my ~50GB of photos, and they are downloaded only when you try to open them. I have documents and desktop files on iCloud (get downloaded on demand as well), and I use Google Photos as an Apple Photos backup. All of my more recent projects are on GitHub. I guess I only keep non-vital files on my laptop.
Having said that, I have an old 1TB Time Capsule at home (where I work from), and let macOS do its incremental backups automatically every hour. In addition, I usually launch a manual backup whenever I make some big or important change.
I transfer my data from the most recent backup whenever I buy a new laptop and I'm usually ready to go in a hour or so, they work wonderfully.
So you're just an account glitch away from being locked out of a bunch of your documents and/or code?
To me, backing up means being isolated from failures by a single point - be that a drive or a whole company. One of the reasons I often avoid cloud services is because they're hard to backup.
No, the worse that could happen is if I got locked out of my iCloud account. I would lose some files in ~/Documents/ and ~/Desktop/ (a lot of stuff I copied on an external hard drive before uploading to iCloud), which I don't really need.
Everything else is backed up automatically hourly, and all my work is in on GitHub.
> I use iCloud Photos for my ~50GB of photos, and they are downloaded only when you try to open them. I have documents and desktop files on iCloud (get downloaded on demand as well), and I use Google Photos as an Apple Photos backup.
Are all your photos "backed up" on Google Photos via the iOS app?
What happens if iCloud fails, or if you get locked out of your account?
I don't think the iOS app can backup to Google Photos, but in any case, no.
All my photos are on both iCloud, and Google Photos. That's every single picture, including the ones I snap now that get automatically synced on both iCloud Photos and Google Photos.
For home I rotate 2 hard disks in 2 locations and backup using EaseUS Todo.
I also put some stuff on Dropbox for a quick backup if I don't want to wait until the next time I to a disk backup. Dropbox + zipped folder =~ private single user Github repo :-)
My backup strategy is top-secret, I don't want anybody to know where my files are located and how it is recoverable, especially not everybody on the internet.
1. First, I run it locally on my desktop against a repository I keep on the same drive (/home/backup); then
2. I update, with rsync, a copy of this repository I keep on a dedicated server with full disk encryption; and, finally,
3. I spin up an EC2 instance, mount an S3 bucket with s3ql, and rsync the copy from the previous step up to the one I keep on this bucket.
This process is (embarrassingly) manual.
The backup repository is encrypted with Borg itself, and if I am in need of recovering something I do it from the local copy. I never mount the repository in remote locations.
¹https://github.com/borgbackup/borg