Hacker News new | past | comments | ask | show | jobs | submit login
Deduplicating Archiver with Compression and Encryption (borgbackup.org)
134 points by jiehong on July 26, 2021 | hide | past | favorite | 71 comments



After years of looking at it, I've been using borg for almost a year now, and I've already replaced rsnapshot with it pretty much everywhere. I'm very happy with the deduplication feature (that's my main motivation for switching), but the performance and ease of use are very appreciable too. Actually, performance might be as much as a game changer for me as deduplication (though a less expected one), as I find myself doing and automating backups way more often than before. I also enjoy the ability to do append-only backups which mitigate the risks of losing already save data because of bug / malware.

I've been expecting more and more from it, and at this point, there are only two things I wish would be better supported: - deduplication across several machines (you can already backup several machines to the same repository, but it's not efficient); - builtin redundancy to deal with bitrot.

I'm still using rsnapshot for secondary backups, as I can't afford a bug in backup software, but I'm considering switching to restic for that, as it provides deduplication as well and doesn't share code with borg.


Last time I checked (I think it was a few months ago), Borg had a couple of significant limitations:

1. no realtime incremental backups (that is, use filesystem observers in order to be able to perform incremental backups at short intervals

2. no multithreading

I didn't find any paired tool which could do the task #1, so for large systems, I think backups are very resource-intensive, and can't be scheduled frequently.


1. as borg always does FULL backups, it needs to access all files' metadata anyway. that together with borg's "files cache" enables borg to quickly detect and back up unchanged files. the created archive always contains ALL files.

2. for many users not that important for the daily backups, because they are quick anyway. for first backup or for users with huge daily changes, it would be nice to have though, that's why it is on the long term TODO.

if you want to put more load on the CPU, you can partition your input data, feed each partition to a separate borg process (and separate borg repo). multiprocessing instead of multithreading.


WRT 2., I had never paid attention to that, but it makes sense as borg is written in Python. That's probably not an issue for many personal use cases where backups are going to be I/O bound anyway, but I guess it's a significant limitation if you try to do near-realtime backups or have enterprise-grade storage.



Haven’t tried any backup solution for my desktop other than some external SSDs I copied to a few times, yeah I know, how does borg compare to Kopia for home user who has maybe a few TB ? Or in that regards to timemachine which I’ve never used.


i'm a happy borg backup user, but this looks really promising. thanks for sharing. have the developers made any commitments to storage format stability?



If you want to (also) backup remotely (the server don't have access to encrypted data), I would recommend some borg-specific hosts:

- Rsync.net: https://www.rsync.net/

- BorgBase: https://www.borgbase.com/

- Lima-Labs: https://storage.lima-labs.com/

- others?

(I'm not affiliated in any way, I'm just a client of one of them.)


Hetzner's "storage box" product also supports Borg backups:

https://www.hetzner.com/storage/storage-box


Thanks! I had no idea, this is going to simplify my backups a lot.


I use Time4vps storage boxes. They are the best deal I could find. I have a 1TB one that I've used for a couple years.<br>

https://www.time4vps.com/?affid=1881&_ga=2.61032364.12812044... - (referral link)

https://www.time4vps.com no affiliate link for time4vps


Mhm, BorgBase looks a bit fishy.

It is based in Malta, but why? For "tax efficiency"? They share the office address with [1] and the CEO of the parent company PeakFord[2] seems to be a CEO-for-hire[3].

Their GDPR page[4] mentions that they are not based in the EU and want to use the German privacy regulator. Malta is very much part of the EU.

[1]: https://opes.com.mt/corporate-services/ [2]: https://www.peakford.com/about/ [3]: https://offshoreleaks.icij.org/nodes/56060344 [4]: https://www.borgbase.com/gdpr


"Mhm, BorgBase looks a bit fishy."

I don't think it is - it has been around for several years and appears to be completely legit.

CEO is based in Hong Kong (I think) and appears to be more "international" than perhaps you or I are. He is relatively active here and on reddit.

I hope to call him up and meet for a drink next time I am building things in HK ...


Wow! Thanks for the kind words and reference!! Much appreciated. :-)) Will PM for drinks and to link up. I still love how friendly everyone in IT and around backups specifically is.

More international for sure. But with Covid last year and offspring on the way this year, we are looking to spend more time in Europe closer to family. Hence the change of incorporation earlier this year.


You're awesome for defending one of your competitors like this.


BorgBase founder here. All correct what you found. I started the company from the US via YC startup school in 2018, but later decided to place myself in Europe again. Hence moving the incorporation to the EU made sense. My director is handling the commercial aspect, like VAT, taxes and banking. Myself, I maintain our open source projects like Vorta, our Ansible role and macOS packaging for Borg.

Hope this helps to clear your reservations. I should update our GDPR page with regard to the regulator. That's outdated by now.


Thanks for the reply. I guess a tech person not wanting to have anything to do with bureaucracy just looks similar to an attempt at tax evasion from the outside.


Do you have any advice for a technical 1-dev-army who wants to expand out to a small product-focused team and outsource (most of) the commercial and sales aspects to a partner?

Thanks for being a class act, based just on your presence here I might go Borg for a backup solution in the near future


Such a big question.. Here from least to most involved based on my experience:

If you only need incorporation, accounting and taxes solved, you can just pay ~hourly. Price will depend on time spent. Budget more in the beginning to find a good setup that will save time later. Accountants generally know very little about automation opportunities, so you need to find them yourself.

For sales positions, I'd always go for results-based compensation (like awarding equity as results are reached).

If the person does more than pure sales, you need a non-tech cofounder with some equity and the task to build commercial processes, do sales, decide features.

What you need will depend very much on your own skills and the product. Some stories: For my first startup, a physical consumer product, I had an equal partner with complementary skills who was very involved in product dev and sales. This worked well.

Later, in two other startups, I had technical cofounders and there was nobody to do sales and collect feedback. Worked less well obviously.

Hope this helps a bit and good luck with your idea!


Customer for several years of Borgbase here. I can assure you the service and the support are very good, and Manu does a good job. Would recommend.


Same here, I'm using them since 6 months and I'm very happy with the service, support is great as well :)


Multi year user here, nothing but good experiences with them.


I've been using borg now for a couple of years. It has been great.

I previously used duplicity. It would do a full backup once every week or month, resulting in GBs of data being send over the line.

Borg chunks the data and uses deduplication. It result is just one huge backup at the beginning and incremental changes afterwards.

The abstractions used are also much friendlier. I never took the time to retrieve specific point in time backups from duplicity. With Borg one can mount the whole backup repository as a fuse mount, which each different backup being a distinct directory.

rsync.net offers cheap online storage for borg backups.

One caveat, borg is not yet multiprocess.


There is also Borgmatic, which makes it easy to manage backup settings, do monitoring, directly dump databases and prune old archives with a single YAML file. https://torsion.org/borgmatic/

We also maintain an Ansible role to set it up quickly on new servers. https://github.com/borgbase/ansible-role-borgbackup

And last, if you need a place to put your backups, try my https://borgbase.com offering. It's purpose-built for Borg and offers handy features, like separating each repository and monitoring for stale backups.


Thanks for the shout out, Manu! I'll also plug an all-in one Borg + borgmatic Docker image (https://hub.docker.com/r/b3vis/borgmatic/) that I've used to backup to both BorgBase and rsync.net.


And the contender is restic

https://restic.net/


borg maintainer here:

Yeah, restic is a great backup tool also.

OTOH it is a pity the we could not join forces (Python vs. Go and also some other differences), but otoh it means more choice for everybody wanting a nice deduplicating backup tool.


It is awesome. I like it better.

It only deduplicates though. Does not compress.


I guess I'll ask here since I'm in the process of updating my backup system.

I have a variety of machines I'd like to backup (desktops/servers/laptops/phones) running a variety of OSes.

I have a NAS ZFS machine that I'd like to host a copy of all my data from each machine.

From the NAS, I'd like to backup to a cloud host (e.g. rsync.net).

What I'm unsure about is where to introduce Borg in this scheme. I see a few permutations: 1) Borg from each machine to the NAS, then rsync to rsync.net 2) Rsync/ZFS send/etc. from each machine to the NAS, then Borg to a different location on the NAS, then rsync the Borg repo to rsync.net 3) Rsync/ZFS send/etc. from each machine to the NAS, then Borg directly to rsync.net

I'm leaning towards #3 personally. Thoughts?


Personally, I would choose option 3.

If you control the NAS, physically, then you can reduce some complexity by having unencrypted (and easily browsable) backups there ... and save the encryption for the rsync.net side of things ...

ALSO, if the NAS side of things is unencrypted, then you can establish a nice zfs snapshot schedule on the NAS and have those quickly and easily browsable as well. If you have borg backups on the NAS then even the simplest of restores becomes a full blown "restore" operation with decryption and keys, etc.


If you have ZFS clients and server, what benefit does inserting borg provide on top of a continuous ZFS send from client -> server and server -> rsync.net? ZFS send can already do encryption, and ZFS supports compression. And snapshots automatically "dedups" files that haven't changed between snapshot points, which I'd guess saves more space than deduping literal duplicate files which are not very common ime.

I guess I just don't see what value borg provides if you're already using ZFS throughout.


I guess my concern is that I'd like the data to be encrypted at rsync.net, which means I'd need to use raw ZFS send. Would it be possible to mount/browse those raw datasets remotely to verify my backups? Or would I need to recv it all back to be able to test them?


Just wanted to point to the FAQ section inside the borg docs.

https://borgbackup.readthedocs.io/en/stable/faq.html#can-i-c...


I'd say this depend on where you want to introduce the encryption and de-duplication.

I'd lean towards 1, so each machine has a standalone encrypted backup, but 3 would provide easier access without the borg client for local backups, and better de-duplication if files are shared across machines.


I'd prefer 1, as it allows each machine to easily and quickly restore itself directly from a local backup. This way the ZFS and rsync.net backup components are only used incase of catastrophic failure.


Very content with Borg for my backups, using it via the Vorta GUI client [1]. Zero issues for me unlike Duplicati and Duplicacy.

[1] https://github.com/borgbase/vorta


I'm curious what your issues were with Duplicacy. I looked at Borg, Duplicati and Duplicacy among others when I was evaluating a backup solution and eventually settled on Duplicacy and have been pretty happy with it. The only issue I've had is that it's a bit memory hungry during backups.


1) The web app GUI is bad - can't reorder saved options/backups/etc, each of these options use up too much screen space and there's no compact mode or minimising the options, it could be more user-friendly especially with being able to add arguments with the GUI rather than text.

2) It's proprietary with source-available but not open-source. This makes me hesitate to rely on it in the long-term as I don't know if it'll remain supported, especially considering that:

3) Development speed was very slow when I dropped Duplicacy about 6 months ago, and seeing Borg have many contributors makes me think it's more likely to stick around in the long-term.

4) GUI doesn't give much insight into the status of the backup other than progress %.

5) Restore operations are easier and quicker with Borg/Vorta because you can mount the backup via FUSE.


I use borg since recently, it's awesome.

I also like how it is possible to navigate into archives by just mounting them in the filesystem: https://borgbackup.readthedocs.io/en/stable/usage/mount.html


Love borg with the desktop client vorta. Gives my time-machine like time travel :D

Even wrote a blog post about it: https://simon-frey.com/blog/borgvorta-is-finally-a-usable-ba...


For the lazy like me, there's this: https://vorta.borgbase.com/


Since I moved my personal laptop from Mac to Linux I have been using borg. It works really well so far.

One thing that’s confusing me is the prune strategy. It seems prune removes the whole backup set at once. Let’s say I have daily backups and weekly backups. At some point I am pruning the daily backups. It seems this means that if the weekly backup ran on Sunday but a file was created on Monday and deleted on Friday, the file would be completely deleted during prune and there would be no trace of it.

I would much prefer if the pruning was done on per file basis and not by pruning whole backup sets.


> I would much prefer if the pruning was done on per file basis and not by pruning whole backup sets.

Then just don't prune! Unchanged files take no space. There's no reason you can't keep many months of daily backups, and set your prune to only do: --keep-within=365d

For those who aren't familiar with Borg, there is no such thing as a weekly backup, daily backup, monthly backup, etc. Every backup is a quick full backup. It's only the prune (which deletes old backups) option that gives you the choice of bracketing your backups into monthly/weekly/daily sets, so it'll save the last backup of each month for the last X months, the last backup of each week for X weeks, etc.


You are right. Not pruning may be the solution.


Maybe have a look at borgbackup docs to use the same terminology, otherwise it might get confusing.

When you create a backup ARCHIVE in a borg REPOSITORY, the archive contains all input files you gave to borg.

Each archive has a name and usually the name is something like machinename-setname-date-time.

You can create such backups rather often (that is cheap, due to deduplication), but you do not want to keep tons of archives long term (some borg commands take O(archive count) time).

Thus, one runs borg prune to "thin out" archives and only keep some following some prune policy, like keeping 60 daily, 12 monthly, 10 yearly or so.

It is important to use --prefix if one creates multiple different archive sequences in the same repo, e.g. if you create pc-home-date-time as well as pc-system-date-time - then you need to run prune with --prefix=pc-home- and another prune with --prefix=pc-system- .

Usually, archives are not modified after their creation (the only way to do that is to carefully use borg recreate command).


And yet again: A list of alternatives.

attic (python) - https://github.com/jborg/attic

borg (c) - https://github.com/borgbackup/borg

bupstash (rust) - https://github.com/andrewchambers/bupstash

duplicacy (go) - https://github.com/gilbertchen/duplicacy

duplicati (c#) - https://github.com/duplicati/duplicati

duplicity (python) - https://github.com/henrysher/duplicity

kopia (go) - https://github.com/kopia/kopia

nfreezer (python) - https://github.com/josephernest/nfreezer

rdedup (rust) - https://github.com/dpc/rdedup

restic (go) - https://github.com/restic/restic

rclone (go) - https://github.com/rclone/rclone

rsnapshot (perl) - https://github.com/rsnapshot/rsnapshot

snebu (c) - https://github.com/derekp7/snebu

tarsnap (c) - https://github.com/Tarsnap/tarsnap

I think there are many more out there (https://github.com/restic/others) - I personally use

  restic
while technology wise (speed, only restore needs password) i would prefer

  rdedup
which is an impressive piece of software but unfortunately without file iterator... :-)


borg maintainer here:

attic should not be on the list, it is unmaintained since 2015.

borg came into life as a fork of attic for related reasons, so it should be borg (Python+Cython+C).


would casync [1] fit in this list?

[1] https://github.com/systemd/casync/


What about Borg vs Restic? Which is better?


Restic doesn't support compression and probably won't. https://github.com/restic/restic/issues/21

Borg has a "lackluster" encryption scheme. https://borgbackup.readthedocs.io/en/stable/internals/securi...

Restic can't pull backups. https://github.com/restic/restic/issues/299

Borg can't either, but it has "borg import-tar", which might be good enough.


> Restic can't pull backups

There's ways to pull backups. You'd use ssh forwarding and/or the REST backend for append-only backups. See this comment:

https://github.com/restic/restic/issues/299#issuecomment-456...


Could you explain why you find their encryption scheme 'lackluster'? I'm not seeing that conclusion in the linked document.


Borg uses static AES keys and requires a way to store nonces because of this. In a multi-client scenario there is no way to do that without trusting the server to some extent, so the encryption is vulnerable to nonce reuse.

> When the above attack model is extended to include multiple clients independently updating the same repository, then Borg fails to provide confidentiality (i.e. guarantees 3) and 4) do not apply any more).


Yeah, that's an issue which can be avoided by using 1 repo per client.

There are some ideas on the issue tracker for fixing this long term (like random nonces, session keys, ...), but that stuff will have to wait until after borg 1.2 (which soon goes into release candidate phase).


Restic runs natively on Windows too which is useful in heterogenous environments

Filippo Valsorda thinks[1] the encryption implementation looks sane.

[1] https://blog.filippo.io/restic-cryptography/


A major difference is that, Borg needs a server side code running as well, unlike restic.

Restic can take up a lot of RAM, unlike Borg.

Borg offers the option of no encryption. This is useful when backing up to a local drive that is already encrypted, for example with LUKS.

But, they are generally similar.


Borg doesn’t require server side support. You can mount locally using, say, SSHFS.

However having server side support reduces backup times.


I think you are both not quite right ...

Which is to say, you don't need the borg executable server-side if you are happy to run it over SFTP transport which limits some functionality.

Yes, of course you could talk over an sshfs mount point but I can't think of why you would do that over the more basic SFTP transport.

The accepted, and full-featured, way to run borg is with the executable on the server side and I will point out that the borg project distributes a "frozen" version of the tool that allows you to run it without having python in your environment. I believe they use py2exe to pack it up.

This is important for us[1] as we have no interpreters of any kind in our environment (no shell, no python, no perl) so we can only run binary executables ...

[1] rsync.net


In the FAQ [1] SSHFS is mentioned a few times and here is the citation where I got this idea from when I read it way back when:

“When Borg is writing to a repo on a locally mounted remote file system, e.g. SSHFS, the Borg client only can do file system operations and has no agent running on the remote side, so every operation needs to go over the network, which is slower.”

SSHFS would be tool agnostic/transparent though of course would result in operations going over the network as it’s not a local repo, but pseudo-local.

I’m curious, how do you use plain SFTP transport with borg using rsync.net?

[1] https://borgbackup.readthedocs.io/en/stable/faq.html


hmmm ... it appears that I might be the one that is not quite right ... I was positive that there was a less-featured usage mode over plain old SFTP.

Either way, having the borg executable on the server side is the "correct" way to do things.


borg does either use a directory as repo (no matter whether local or mounted via network fs) or works client/server via pipes that are usually over ssh (in that case, it needs a remote borg process to talk to). instead of ssh, one could also use anything rsh-like, but of course ssh is the best option here.

no sftp inside borg (nor anything else).


Restic doesn't implement compression and most probably never will (since compression when combined with encryption has been known to make the encryption "weaker").

Borg's "encryption" doesn't instill confidence in me.

Restic ships as a static binary so you can shove the version you used to create the backup and will be able to restore it forever (assuming we still run x64 machines).

Borg requires a server side component so cannot natively backup to cloud object stores. Restic can.

Borg OTOH is very lean and doesn't take a lot of RAM compared to Restic.

You cannot go wrong with either. I tried both and found Restic much easier to run and manage and the memory usage wasn't an issue for me (~4TB backups).


I don't think the reason Restic won't support compression is because of some security concerns (Restic encrypts blobs separately, and you'd compress things blob-wise as well), but because there simply wasn't any space designed into their file formats to signal compression, so it's essentially impossible to implement it without breaking compatibility.

Borg's story is actually similar but the opposite. Attic only used to support zlib, other methods were added later. This was possible because the zlib header uses only a few values for the first two bytes, so there is enough room to indicate various compression formats.


We found that restic (at least in its newest version) uses ~ the same amount of RAM or even less that Borg on some systems. But no big differences there.


I use restic as well, was just interested to see what's different about Borg. But I think restic was the right choice, already saved me once from data loss.


They’re both a fine choice really. And being open source there’s no harm (other than time) in trying both and seeing which suits your use case best


I wont comment in terms of features, but in my experience Borg is slower than Restic. This might be worth considering for low-end servers, and for big amount of data. Restic used to have an extremely slow prune (purge unused old data) feature, but this has improved dramatically in one of the latest releases.


If you need Windows or GUI support there is Kopia https://news.ycombinator.com/item?id=27471945


Is it comparable with exdupe?

http://www.quicklz.com/exdupe/


I don't know exdupe, but from a quick glance at their homepage it looks like a full + differential style backup.

With borg, every backup is logically a full backup (it just is faster due to deduplication). Much easier and less time consuming to deal with.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: