A performance comparison of Duplicacy, restic, Attic, and duplicity

2bluesc · on July 18, 2017

The results don't look very complete seeing as how attic was abandoned over two years ago. The master attic branch[0] has 600 commits. The fork of attic, borg, has over 4000 commits [1] suggesting a significant amount of work has been done to improve it.

It seems odd for the author to compare it to something abandoned (and thankfully reborn as borg) and ignore what has happened in two years.

Would love to see similar tests run against borg.

[0] https://github.com/jborg/attic

[1] https://github.com/borgbackup/borg

acrosync · on July 18, 2017

Author here. The experiments actually ran BorgBackup 1.1.0b6 as you can see from the Setup section. We liked to call it Attic out of the respect to the original Attic author.

tombrossman · on July 18, 2017

You realize the name Borg comes from the Attic author, Jonas Borgström, right? No one else calls it Attic as the two are different projects.

acrosync · on July 18, 2017

I noticed that, but didn't know if Borg has another meaning. I can understand why they forked the project, but in my opinion a name that makes the origin more obvious would have been better.

dom0 · on July 24, 2017

"Borg" was chosen, because it emphasizes collaborative development — and because someone is a Star Trek fan ;)

zokier · on July 18, 2017

The version row for attic says "BorgBackup 1.1.0b6"

atonse · on July 18, 2017

Is anyone using such tools as a backup for their NAS (and then using their NAS for time machine?).

That would beat having to install something like Backblaze on every family member's machines. Cloud backup is great but it's always better to have a local (LAN) copy and then an off-site copy.

robotmay · on July 18, 2017

I use borg (attic fork) backing up to rsync.net for my home server. All my machines back up locally to that machine (mostly using SyncThing), then it backs itself up every hour or so. It's not perfect but it does work really rather well.

Borg is really nice, and rsync.net is of that variety of service that are always my favourites: it does one thing very well.

Also they offer a discount if you use borg or attic (possibly others) as they turn off their ZFS snapshot system and assume your software handles that.

tombrossman · on July 18, 2017

How much data and what's your monthly cost? I have a 2TB storage VPS for under $10/month. rsync.net looks very good but is possibly total overkill for my needs. Definitely don't need >1 snapshot/day as that's what my hourly local backup is for.

robotmay · on July 18, 2017

It's a bit more expensive than most options but the price does vary by how much of your data is duplicated. At the mo I have 1TB of data being backed up but it squashes down to under 550GB with borg. Monthly price for me is about $17 (paid yearly though).

I'm happy with the extra price for my use, but yeah it's not the best option if you have a lot of unduplicated data.

If you're interested, here's the link to the borg/attic pricing page: http://www.rsync.net/products/attic.html

GordonS · on July 18, 2017

Can I ask where you got the VPS? That's a really great price for 2TB!

tombrossman · on July 18, 2017

At present I'm using a provider in Lithuania called time4vps. Overall the service is good (assuming you are connecting from Europe) but to use their website I have to disable my ad-blocking & privacy add-ons, which I don't have to do on other providers' sites. Not sure why that is.

I'll probably try Delimiter once they start offering service in London, as they also have some similar low cost + high storage plans.

tombrossman · on July 18, 2017

> Is anyone using such tools as a backup for their NAS (and then using their NAS for time machine?).

Do you mean backup to the NAS or for backing up the NAS itself to a third location?

Either way, there's no need for an either/or approach. Just do both. I've tried multiple backup applications and many support local and remote backup options as standard. I'm using Borg and Back In Time and with Borg it's just a second cronjob with nearly identical scripts for the off-site backup over SSH. With Back In Time I was using the AWS CLI to push the backups to S3, but I found a better deal with cheaper storage. I have about 200GB data in total but want a 1TB archive available online for older backup sets.

Side note - OP's review is helpful but Borg Backup is oddly listed under 'Attic', which it was forked from years ago. Don't bother looking for Attic if you are comparing backup tools available today.

2bluesc · on July 18, 2017

> Is anyone using such tools as a backup for their NAS (and then using their NAS for time machine?).

I use borgbackup to back-up my stuff locally with rclone to mirror the borg repository in the cloud (personal Google Drive in my case) and have also experimented with running borgbackup to an offsite Raspberry Pi.

flipbrad · on July 18, 2017

Duplicity is pretty easy to use with Backblaze B2 as a cloud storage backend - that's what I use to backup my NAS.

ams6110 · on July 18, 2017

> duplicity has a serious flaw in its incremental model -- the user has to decide whether to perform a full backup or an incremental backup on each run. That is because while an incremental backup saves a lot of storage space, it is also dependent on previous backups due to the design of duplicity, making it impossible to delete any single backup on a long chain of dependent backups. So there is always a dilemma of how often to perform a full backup for duplicity users.

Yes and no. Duplicity has the "--full-if-older-than" option so you can do incrementals normally, but if your previous full is older than whatever interval you define, it will do a full backup, without changing the command line. So that can be e.g. in a cron job.

willvarfar · on July 18, 2017

Classic source control had this problem.

The clever trick is to reencode the previous most-recent backup as a delta from the current state, and do a full-backup of the current state, rather than encoding each new backup as a delta from the previous state (which becomes slower and slower to compute, the more previous states you have).

Problem solved :)

tatersolid · on July 19, 2017

That's really expensive when your previous backup is on cloud storage

willvarfar · on July 19, 2017

To compute a delta, you need the previous version. If this previous version is computed from a single file - the previous snapshot - then that's actually less data and effort than if its computed by taking an old snapshot and replaying all the deltas upto the current time.

brunoqc · on July 18, 2017

It's a shame Duplicacy is not free software.

acrosync · on July 18, 2017

Author of Duplicacy here. To personal users, it is free software.

comice · on July 18, 2017

It doesn't meet the free software foundation's definition of free software, which I believe was the original point.

It lacks the freedom to run the program as you wish, for any purpose (freedom 0).

https://en.wikipedia.org/wiki/The_Free_Software_Definition#T...

https://github.com/gilbertchen/duplicacy/blob/master/LICENSE...

_wxn8 · on July 18, 2017

The FSF's definition of "Free Software" doesn't necessarily match the English language's definition of "free software". Since we're speaking English, I think it's reasonable to assume the latter meaning, like most reasonable people who haven't encountered the FSF would.

I really wish people would capitalize things like this. "This is not Free Software" would make the sentence unambiguous. You can't just go around redefining the English language willy nilly and expect people to play along.

Languages do evolve naturally but this isn't natural. This is an organization trying to influence the language to advance an agenda (though I believe it to be a worthy agenda, it's still an agenda).

comice · on July 18, 2017

"It's a shame Duplicacy is not free software."

If the commenter meant no cost it's more reasonable they'd have said "It's a shame Duplicacy is not free". Or "It's a shame Duplicacy costs money".

Capitalising would have made it clearer yes, but I think given the language and the context (hacker news) I was justified. But the author, @acrosync, was also justified in assuming it meant no cost.

acrosync · on July 18, 2017

Free or not free aside, my question is, does it matter to personal users between this free-for-personal-use license and any of those more permissive licenses like MIT, BSD, or GPL?

comice · on July 18, 2017

It matters to me as a personal user because my use of duplicacy might change at some point and suddenly I'd lose rights to use it (unless I pay). I'd lose rights to any development contributions I might have made unless I pay.

And as a personal user, I can't use any code from Duplicacy in any other project. I can't even, say, create a package for it and get it included in Debian.

And aside from some of these practical issues, I'm a personal user who supports software freedom so I don't want to use something encumbered in this way.

And as a commercial user, any development contributions I make are no longer my own and I have to pay to make use of them.

But the worst part of it is, your license isn't very well defined. As it stands, you may at any point stop accepting license payments from a commercial user and they'd lose the right to use it entirely - they'd lose access to their backups (unless they used the software without a license).

You of course have the right to choose any license you like! I just wouldn't use duplicacy myself under the terms of that license.

acrosync · on July 18, 2017

Thanks for your feedback. The reason I don't like open-source licenses is that I don't want for-profit companies to use my software without paying. The ideal license would be the one that requires them to pay while being appealing to personal users like you. I don't think these two goals are irreconcilable, but unfortunately such a license doesn't exist yet.

comice · on July 18, 2017

I did wonder if being fully free might encourage more users who might fund you in other ways but Borg backup isn't making very much like that, so perhaps not: https://www.bountysource.com/teams/borgbackup

The AGPL license might be a step in the right direction (for your requirements). It aims to at least ensure that if companies use the code to provide a service to other users, they have to release their changes. You can sell those companies a different license if they don't want to accept the AGPL (you'd have to have a contributor agreement to assign copyright to you though, to allow you to relicense code at your discretion like that).

Or there is the open core model (like nginx-plus), where you provide the code under an open source license but provide some additional "enterprise" features (like your vmware stuff) to only those that pay. I'm not a fan but it seems to work for some.

Anyway, duplicacy sounds a great design. All the best with it!

Theriault · on July 18, 2017

It's stopped me from using it for personal use.

From a practical standpoint, it makes it more difficult for me to trust that it will be maintained in the long-term, or that I can extend its functionality if I see a need.

Ideologically, I'm somewhat uncomfortable using duplicacy when fully-free alternatives exist. I'm not a free software purist by any definition (I use steam. I have a Netflix subscription. My android has google apps on it.), but this is an area where compelling free software solutions do exist.

The license also keeps it from being packaged in most Linux distributions, which makes it a nuisance.

_JamesA_ · on July 18, 2017

It is an issue of trust for something as important as a backup tool.

Duplicacy sounds great but it has 6 [0] contributors and Borg has 107 [1]. It's obvious which one has more eyes on it.

Plus I can apt-get install borgbackup / apt-get upgrade which adds another level of trust.

[0]: https://github.com/gilbertchen/duplicacy

[1]: https://github.com/borgbackup/borg

acrosync · on July 18, 2017

We just released the source code less than 2 months ago. The difference in contributor counts may not be this big in one or two years from now.

hedora · on July 18, 2017

I hate switching backup software, and with your current license, you become a single point of failure--will you be supporting this thing in 10 years when I need to back my FireflyBSD 128-bit RiscV machine up to the walmart cloud (or whatever random os/hardware/cloud is common in a decade)?

If this were BSL licensed, the community could fork it if that became an issue:

http://monty-says.blogspot.com/2016/08/applying-business-sou...

I'm curious to see whether any BSL software manages to build a third party dev community (inclusion in debian non-free, third party patches and bug reports, etc)

While I have your attention: It'd be great to measure how many bytes the solutions read and write, as well as I/O counts. There are tools for this in Linux, and probably MacOS. Alternatively, network bandwidth would be a good proxy for these measurements.

yorwba · on July 18, 2017

The unambiguous term would be "gratis" or "free of charge".

Your software is not "free software" in the "libre" or "free to modify and share" sense.

hannob · on July 18, 2017

The term "free software" has a very specific meaning [1] and there is no such thing as "free software, but only for personal users".

[1] https://www.gnu.org/philosophy/free-sw.en.html

crdoconnor · on July 18, 2017

Performance bothers me an order of magnitude less than the potential for obscure bugs which could lose me data.

dabeeeenster · on July 18, 2017

Best thing about restic is not having to spend 45 minutes downloading and manually bullshitting around with python dependencies and libraries. What Go is good at IMO.

rkrzr · on July 18, 2017

`attic` and `borg` are just one `sudo apt install attic` or `sudo apt install borgbackup` away. No "bullshitting around with python dependencies" required. It looks like they also have packages for other platforms: https://github.com/borgbackup/borg/releases

dabeeeenster · on July 18, 2017

I was talking mainly about duplicity which has always been like pulling teeth.

dom0 · on July 24, 2017

My two (possibly biased, much like the author's) cents.

- No network-based tests; e.g. a typical fast internet connection (say 100/40 or 50/20 MBit/s) with a few dozen ms latency to some server or cloud service. This is of course difficult because these tend to be bad on reproducibility. For a network-based test, not only time is interesting, but total RX/TX as well.

- I'm really surprised at restic's performance. It uses far more CPU than Borg in almost all tests... and Borg is already notoriously inefficient in it's CPU usage when looking at object throughput (restic: "fast, efficient"?). I don't mean to bash, I'm just surprised.

- restic's deduplication performance might hint at Rabin Fingerprints being worse than Buzhash, but there might be other issue(s) leading to this result.

- Besides CPU time, memory (peak) usage would be interesting.

> For instance, file hashes enable users to quickly identify which files in existing backups are changed. They also allow third-party tools to compare files on disks to those in the backups.

To be fair, Borg can calculate a variety of file hashes (MD5, SHA1, SHA2, ...) on the fly with "borg list". There are "borg diff" (to compare two archives) and "borg mount -o versions" as well, though the latter is generally impractical for looking at a large number of archives.

> Again, by not computing the file hash helped improve the performance, but at the risk of possible undetected data corruption.

I can't deduce how the last part follows (", but..."). Care to explain?

beagle3 · on July 18, 2017

i think bup does support concurrent access; the concurrent deduplication granularity is a backup set, so if two identical computers are backed up for the first time at exactly the same time you will not get deduplication - but that's inherently a dining philosopher kind of problem

Also, recent bup versions allow delete. No encryption IiRC, but you can examine it with git tools which is a feature on its own.

e12e · on July 18, 2017

It's a little odd to not benchmark backup over the network - a backup taken to the same physical disk as the source of the data isn't very useful - for that use-case taking a filesystem snapshot[s] would probably be faster and more useful. Perhaps in combination with a checksumming tool, like [c], or with a filesystem like ZFS.

Also, it can be difficult in a lot of environments to sustain more than 100mbps write to a remote, off-site system - halving the stored data can be a much bigger win then.

All that said, it's interesting to see that a) duplicity seems slow, and b) very consistent in terms of speed. I wonder if there's some low-hanging fruit for optimization there.

Personally I've had some luck using backupninja[b] in combination with duplicity. It's one of the few Free alternatives that allow the backup-system to encrypt "one-way" - so that compromising the backup-system doesn't immediately give read access to encrypted backups. It's a bit complicated to set it up for separate encrypt-to and signing keys though :/

[s] Today I would probably recommend ZFS - but I've always wanted to give NILFS2 a real test, especially on solid-state disks: http://nilfs.osdn.jp/en/

[c] https://github.com/Tripwire/tripwire-open-source

http://aide.sourceforge.net/

https://github.com/integrit/integrit (Speaking of projects that might be fun/useful to redo in a safe language like rust or go - it would appear this would be a prime example, btw. On the whole moving integrity to the fs, as with zfs might be the better option, though).

[b] https://0xacab.org/riseuplabs/backupninja

dom0 · on July 24, 2017

> All that said, it's interesting to see that a) duplicity seems slow, and b) very consistent in terms of speed. I wonder if there's some low-hanging fruit for optimization there.

Duplicity is classic delta-backup. It always reads all files and calculates a delta to a different version of the file, hence the fairly consistent performance. Performance of deduplicating archivers is more difficult to predict.

AdmiralAsshat · on July 18, 2017

I would've been curious to see how BackinTime stacks up.

Duplicity/deja-dup (a GNOME frontend for duplicity) is pre-installed on most GNOME-based DE's, which makes it convenient for end-users, but I found the limitation of only being able to configure a single backup destination to be too limiting.

By contrast, BiT supported multiple destinations and profiles, meaning I could have one local, one off-site, one "Personal Data" backup, one "System" backup to fall back if an OS update fails, etc. Its configuration options were much more attractive.

amq · on July 18, 2017

If you are using duplicity, use this moment to check if it works. There is a serious bug which seems to affect systems with a lot of data: https://bugs.launchpad.net/duplicity/+bug/896728

luxpir · on July 18, 2017

Could obnam be added? I use it and with a few tweaks it's very respectable in terms of performance. Also has some good integrity checks.

whois · on July 18, 2017

Has anyone run these on their computer? As they noted, they are the author of Duplicacy.

stevekemp · on July 18, 2017

I used to use attic to backup my personal Debian systems, since upgrading to the new stable (Stretch) release it is no longer available, so I switched to borg.

I had to juggle a few things around, but it works well. As does obnam, which I use in a couple of other places too.

acrosync · on July 18, 2017

Author of Duplicacy here. I would like to see other's results too.

dpc_pw · on July 18, 2017

I wonder how rdedup would compare.