After years of looking at it, I've been using borg for almost a year now, and I've already replaced rsnapshot with it pretty much everywhere. I'm very happy with the deduplication feature (that's my main motivation for switching), but the performance and ease of use are very appreciable too. Actually, performance might be as much as a game changer for me as deduplication (though a less expected one), as I find myself doing and automating backups way more often than before. I also enjoy the ability to do append-only backups which mitigate the risks of losing already save data because of bug / malware.
I've been expecting more and more from it, and at this point, there are only two things I wish would be better supported:
- deduplication across several machines (you can already backup several machines to the same repository, but it's not efficient);
- builtin redundancy to deal with bitrot.
I'm still using rsnapshot for secondary backups, as I can't afford a bug in backup software, but I'm considering switching to restic for that, as it provides deduplication as well and doesn't share code with borg.
Last time I checked (I think it was a few months ago), Borg had a couple of significant limitations:
1. no realtime incremental backups (that is, use filesystem observers in order to be able to perform incremental backups at short intervals
2. no multithreading
I didn't find any paired tool which could do the task #1, so for large systems, I think backups are very resource-intensive, and can't be scheduled frequently.
1. as borg always does FULL backups, it needs to access all files' metadata anyway. that together with borg's "files cache" enables borg to quickly detect and back up unchanged files. the created archive always contains ALL files.
2. for many users not that important for the daily backups, because they are quick anyway. for first backup or for users with huge daily changes, it would be nice to have though, that's why it is on the long term TODO.
if you want to put more load on the CPU, you can partition your input data, feed each partition to a separate borg process (and separate borg repo). multiprocessing instead of multithreading.
WRT 2., I had never paid attention to that, but it makes sense as borg is written in Python. That's probably not an issue for many personal use cases where backups are going to be I/O bound anyway, but I guess it's a significant limitation if you try to do near-realtime backups or have enterprise-grade storage.
Haven’t tried any backup solution for my desktop other than some external SSDs I copied to a few times, yeah I know, how does borg compare to Kopia for home user who has maybe a few TB ? Or in that regards to timemachine which I’ve never used.
i'm a happy borg backup user, but this looks really promising. thanks for sharing. have the developers made any commitments to storage format stability?
It is based in Malta, but why? For "tax efficiency"? They share the office address with [1] and the CEO of the parent company PeakFord[2] seems to be a CEO-for-hire[3].
Their GDPR page[4] mentions that they are not based in the EU and want to use the German privacy regulator. Malta is very much part of the EU.
Wow! Thanks for the kind words and reference!! Much appreciated. :-)) Will PM for drinks and to link up. I still love how friendly everyone in IT and around backups specifically is.
More international for sure. But with Covid last year and offspring on the way this year, we are looking to spend more time in Europe closer to family. Hence the change of incorporation earlier this year.
BorgBase founder here. All correct what you found. I started the company from the US via YC startup school in 2018, but later decided to place myself in Europe again. Hence moving the incorporation to the EU made sense. My director is handling the commercial aspect, like VAT, taxes and banking. Myself, I maintain our open source projects like Vorta, our Ansible role and macOS packaging for Borg.
Hope this helps to clear your reservations. I should update our GDPR page with regard to the regulator. That's outdated by now.
Thanks for the reply. I guess a tech person not wanting to have anything to do with bureaucracy just looks similar to an attempt at tax evasion from the outside.
Do you have any advice for a technical 1-dev-army who wants to expand out to a small product-focused team and outsource (most of) the commercial and sales aspects to a partner?
Thanks for being a class act, based just on your presence here I might go Borg for a backup solution in the near future
Such a big question.. Here from least to most involved based on my experience:
If you only need incorporation, accounting and taxes solved, you can just pay ~hourly. Price will depend on time spent. Budget more in the beginning to find a good setup that will save time later. Accountants generally know very little about automation opportunities, so you need to find them yourself.
For sales positions, I'd always go for results-based compensation (like awarding equity as results are reached).
If the person does more than pure sales, you need a non-tech cofounder with some equity and the task to build commercial processes, do sales, decide features.
What you need will depend very much on your own skills and the product. Some stories: For my first startup, a physical consumer product, I had an equal partner with complementary skills who was very involved in product dev and sales. This worked well.
Later, in two other startups, I had technical cofounders and there was nobody to do sales and collect feedback. Worked less well obviously.
Hope this helps a bit and good luck with your idea!
I've been using borg now for a couple of years. It has been great.
I previously used duplicity. It would do a full backup once every week or month, resulting in GBs of data being send over the line.
Borg chunks the data and uses deduplication. It result is just one huge backup at the beginning and incremental changes afterwards.
The abstractions used are also much friendlier. I never took the time to retrieve specific point in time backups from duplicity. With Borg one can mount the whole backup repository as a fuse mount, which each different backup being a distinct directory.
rsync.net offers cheap online storage for borg backups.
There is also Borgmatic, which makes it easy to manage backup settings, do monitoring, directly dump databases and prune old archives with a single YAML file. https://torsion.org/borgmatic/
And last, if you need a place to put your backups, try my https://borgbase.com offering. It's purpose-built for Borg and offers handy features, like separating each repository and monitoring for stale backups.
Thanks for the shout out, Manu! I'll also plug an all-in one Borg + borgmatic Docker image (https://hub.docker.com/r/b3vis/borgmatic/) that I've used to backup to both BorgBase and rsync.net.
OTOH it is a pity the we could not join forces (Python vs. Go and also some other differences), but otoh it means more choice for everybody wanting a nice deduplicating backup tool.
I guess I'll ask here since I'm in the process of updating my backup system.
I have a variety of machines I'd like to backup (desktops/servers/laptops/phones) running a variety of OSes.
I have a NAS ZFS machine that I'd like to host a copy of all my data from each machine.
From the NAS, I'd like to backup to a cloud host (e.g. rsync.net).
What I'm unsure about is where to introduce Borg in this scheme. I see a few permutations:
1) Borg from each machine to the NAS, then rsync to rsync.net
2) Rsync/ZFS send/etc. from each machine to the NAS, then Borg to a different location on the NAS, then rsync the Borg repo to rsync.net
3) Rsync/ZFS send/etc. from each machine to the NAS, then Borg directly to rsync.net
If you control the NAS, physically, then you can reduce some complexity by having unencrypted (and easily browsable) backups there ... and save the encryption for the rsync.net side of things ...
ALSO, if the NAS side of things is unencrypted, then you can establish a nice zfs snapshot schedule on the NAS and have those quickly and easily browsable as well. If you have borg backups on the NAS then even the simplest of restores becomes a full blown "restore" operation with decryption and keys, etc.
If you have ZFS clients and server, what benefit does inserting borg provide on top of a continuous ZFS send from client -> server and server -> rsync.net? ZFS send can already do encryption, and ZFS supports compression. And snapshots automatically "dedups" files that haven't changed between snapshot points, which I'd guess saves more space than deduping literal duplicate files which are not very common ime.
I guess I just don't see what value borg provides if you're already using ZFS throughout.
I guess my concern is that I'd like the data to be encrypted at rsync.net, which means I'd need to use raw ZFS send. Would it be possible to mount/browse those raw datasets remotely to verify my backups? Or would I need to recv it all back to be able to test them?
I'd say this depend on where you want to introduce the encryption and de-duplication.
I'd lean towards 1, so each machine has a standalone encrypted backup, but 3 would provide easier access without the borg client for local backups, and better de-duplication if files are shared across machines.
I'd prefer 1, as it allows each machine to easily and quickly restore itself directly from a local backup. This way the ZFS and rsync.net backup components are only used incase of catastrophic failure.
I'm curious what your issues were with Duplicacy. I looked at Borg, Duplicati and Duplicacy among others when I was evaluating a backup solution and eventually settled on Duplicacy and have been pretty happy with it. The only issue I've had is that it's a bit memory hungry during backups.
1) The web app GUI is bad - can't reorder saved options/backups/etc, each of these options use up too much screen space and there's no compact mode or minimising the options, it could be more user-friendly especially with being able to add arguments with the GUI rather than text.
2) It's proprietary with source-available but not open-source. This makes me hesitate to rely on it in the long-term as I don't know if it'll remain supported, especially considering that:
3) Development speed was very slow when I dropped Duplicacy about 6 months ago, and seeing Borg have many contributors makes me think it's more likely to stick around in the long-term.
4) GUI doesn't give much insight into the status of the backup other than progress %.
5) Restore operations are easier and quicker with Borg/Vorta because you can mount the backup via FUSE.
Since I moved my personal laptop from Mac to Linux I have been using borg. It works really well so far.
One thing that’s confusing me is the prune strategy. It seems prune removes the whole backup set at once. Let’s say I have daily backups and weekly backups. At some point I am pruning the daily backups. It seems this means that if the weekly backup ran on Sunday but a file was created on Monday and deleted on Friday, the file would be completely deleted during prune and there would be no trace of it.
I would much prefer if the pruning was done on per file basis and not by pruning whole backup sets.
> I would much prefer if the pruning was done on per file basis and not by pruning whole backup sets.
Then just don't prune! Unchanged files take no space. There's no reason you can't keep many months of daily backups, and set your prune to only do: --keep-within=365d
For those who aren't familiar with Borg, there is no such thing as a weekly backup, daily backup, monthly backup, etc. Every backup is a quick full backup. It's only the prune (which deletes old backups) option that gives you the choice of bracketing your backups into monthly/weekly/daily sets, so it'll save the last backup of each month for the last X months, the last backup of each week for X weeks, etc.
Maybe have a look at borgbackup docs to use the same terminology, otherwise it might get confusing.
When you create a backup ARCHIVE in a borg REPOSITORY, the archive contains all input files you gave to borg.
Each archive has a name and usually the name is something like machinename-setname-date-time.
You can create such backups rather often (that is cheap, due to deduplication), but you do not want to keep tons of archives long term (some borg commands take O(archive count) time).
Thus, one runs borg prune to "thin out" archives and only keep some following some prune policy, like keeping 60 daily, 12 monthly, 10 yearly or so.
It is important to use --prefix if one creates multiple different archive sequences in the same repo, e.g. if you create pc-home-date-time as well as pc-system-date-time - then you need to run prune with --prefix=pc-home- and another prune with --prefix=pc-system- .
Usually, archives are not modified after their creation (the only way to do that is to carefully use borg recreate command).
Borg uses static AES keys and requires a way to store nonces because of this. In a multi-client scenario there is no way to do that without trusting the server to some extent, so the encryption is vulnerable to nonce reuse.
> When the above attack model is extended to include multiple clients independently updating the same repository, then Borg fails to provide confidentiality (i.e. guarantees 3) and 4) do not apply any more).
Yeah, that's an issue which can be avoided by using 1 repo per client.
There are some ideas on the issue tracker for fixing this long term (like random nonces, session keys, ...), but that stuff will have to wait until after borg 1.2 (which soon goes into release candidate phase).
Which is to say, you don't need the borg executable server-side if you are happy to run it over SFTP transport which limits some functionality.
Yes, of course you could talk over an sshfs mount point but I can't think of why you would do that over the more basic SFTP transport.
The accepted, and full-featured, way to run borg is with the executable on the server side and I will point out that the borg project distributes a "frozen" version of the tool that allows you to run it without having python in your environment. I believe they use py2exe to pack it up.
This is important for us[1] as we have no interpreters of any kind in our environment (no shell, no python, no perl) so we can only run binary executables ...
In the FAQ [1] SSHFS is mentioned a few times and here is the citation where I got this idea from when I read it way back when:
“When Borg is writing to a repo on a locally mounted remote file system, e.g. SSHFS, the Borg client only can do file system operations and has no agent running on the remote side, so every operation needs to go over the network, which is slower.”
SSHFS would be tool agnostic/transparent though of course would result in operations going over the network as it’s not a local repo, but pseudo-local.
I’m curious, how do you use plain SFTP transport with borg using rsync.net?
borg does either use a directory as repo (no matter whether local or mounted via network fs) or works client/server via pipes that are usually over ssh (in that case, it needs a remote borg process to talk to). instead of ssh, one could also use anything rsh-like, but of course ssh is the best option here.
Restic doesn't implement compression and most probably never will (since compression when combined with encryption has been known to make the encryption "weaker").
Borg's "encryption" doesn't instill confidence in me.
Restic ships as a static binary so you can shove the version you used to create the backup and will be able to restore it forever (assuming we still run x64 machines).
Borg requires a server side component so cannot natively backup to cloud object stores. Restic can.
Borg OTOH is very lean and doesn't take a lot of RAM compared to Restic.
You cannot go wrong with either. I tried both and found Restic much easier to run and manage and the memory usage wasn't an issue for me (~4TB backups).
I don't think the reason Restic won't support compression is because of some security concerns (Restic encrypts blobs separately, and you'd compress things blob-wise as well), but because there simply wasn't any space designed into their file formats to signal compression, so it's essentially impossible to implement it without breaking compatibility.
Borg's story is actually similar but the opposite. Attic only used to support zlib, other methods were added later. This was possible because the zlib header uses only a few values for the first two bytes, so there is enough room to indicate various compression formats.
We found that restic (at least in its newest version) uses ~ the same amount of RAM or even less that Borg on some systems. But no big differences there.
I use restic as well, was just interested to see what's different about Borg. But I think restic was the right choice, already saved me once from data loss.
I wont comment in terms of features, but in my experience Borg is slower than Restic. This might be worth considering for low-end servers, and for big amount of data. Restic used to have an extremely slow prune (purge unused old data) feature, but this has improved dramatically in one of the latest releases.
I've been expecting more and more from it, and at this point, there are only two things I wish would be better supported: - deduplication across several machines (you can already backup several machines to the same repository, but it's not efficient); - builtin redundancy to deal with bitrot.
I'm still using rsnapshot for secondary backups, as I can't afford a bug in backup software, but I'm considering switching to restic for that, as it provides deduplication as well and doesn't share code with borg.