Which operates with the same idea, up to a 10Gb local cache and will stream out larger files. Supports a huge number of storage back ends (Dropbox, Google Drive/GCS, Amazon S3/Drive, OneDrive, SFTP, etc). Also can pin files and trees into the cache.
Bought a license maybe 4-5 years ago because the website said that Linux support is coming very soon. It's been coming since then. Tried to sign up for the beta but didn't even get an answer. :(
It means that is was not actually "coming soon". It said that, just to see if many people are interested in that feature. If there were many people interested, development of that feature would start. If not many people were interested, I guess the text would have been just removed.
Makes sense?
Technically the lie really only extends to the "soon" part, since it is coming. And software regularly has a list of future features that often don't ship. Would love to chat if you were ever interested!
Because everyone and their dog would say they want it, or are interested in it, even if they don't actually want it. This puts up a (small) barrier that helps gauge true interest.
It's the same idea as setting up a landing page for a product that doesn't exist and seeing if people click "pricing" or "buy now" to determine if there is a market for it.
I am building Zero to store my personal pictures and videos and I feel like 10GB local cache means that only the last month of pictures and videos is local, which seems very limited.
Are there any technical reasons why the cache cannot be 500GB?
It used to be unlimited but most of the users ultimately desired to offload data and access it on demand. It should be configurable but currently isn’t.
“ExpanDrive runs unlicensed and fully featured for 7 days giving you a chance to try everything out. Once the trial period expires you’ll need to buy a license or you will be limited to 20 minutes of use.”
Interesting idea. But is the cloud version a complete image (perhaps out of sync)? If so then it’s a performance disaster, if not it’s very fragile.
It seems to me what we really want is a cloud file system with local cache (like Dropbox or iCloud conceptually) so that if our local device is vaporized we have a pretty much up to date logical store alive and well (and we can work on any number of machines). The word “swapping” seems to me to be based on the virtual memory model which means that if anything goes wrong you have two disconnected piles of crap.
At a file level you could theoretically have a giant file that is never wholly local, but how useful is this as a feature in real terms?
I think Borg or Tarsnap use the right approach here: a map of blocks, updating a file updates only the changed block(s). It balances the efficiency of updates and the completeness of the copy. Sort of like FAT filesystem, only with block-level deduplication built in.
Of course you don't get a nice mirror of your files right in the cloud, unless you run a separate server that reconstructs it and makes available as traditional buckets.
I use a Rubric appliance, that does block level dedupe and extends to cloud. I was able to instantiate a multi TB db, from the backup to a physical server in minutes. Extremely impressed .
I decided against a block- level system with Zero because I'm trying to make predictions about which files will be needed next locally and that's hard on a block level, I think.
Yes, the cloud version is a complete image (without the file names though) that should be eventually consistent.
And yes, performance is a disaster right now simply because the code is not optimized at all. But the sync to the cloud happens in the background so it should not affect your performance unless you have a "cache miss".
What about often-locally-changed data which are part of a coherent set, the classic case being a file used by a database engine to store data? We nearly always need to mirror/backup a consistent version of it (just after a successful nesting transaction, in the SQL world the upper-level "COMMIT"), but AFAIK for the time being the HSM+backup software cannot detect such a state. trapping existing system calls (fsync and co, in order to copy to the remote storage data in a sync'ed state) but this is not robust because their semantics is not "upon return of this call the whole dataset (in all files) is consistent".
Moreover if the application using the DB engine is not perfect such inconsistency may reside at application level => after a COMMIT the file is consistent for the DB engine, but not for the application.
I wonder if some users of such HSM+backup software felt some major disappointment after restoring an inconsistent version of such a file. Even a minor loss (garbled index) may be hard to detect and lead to a "fork" of the data.
A dedicated system function called to signal "in my set of opened files the data are consistent" would be useful but is AFAIK missing, and even if someone adds it to some libc/kernel it will only be useful when the application code will actually call it.
The kludge is a procedure "order to engine to sync the data ; throttle the engine in 'no write mode' ; create a RO snapshot ; backup the snapshot; unthrottle the engine ; delete the snapshot", which seems not exactly "transparent".
I've been using SyncThing [1] recently, which does a similar thing but between your own devices (anything from Android to desktop to servers in the cloud). I've been using it on Linux and Windows, and it seems pretty good.
I don't think Syncthing is the same. As far as I recall, it stores the full sync folder to disk, similar to traditional sync services. This project only stores a cache on the disk.
You are correct. With SyncThing each device gets a full copy of whatever folders you've asked it to sync there. The FUSE aspect of Zero lets it do just-in-time file transfer and have no solid upper limit on storage, saving bandwidth and disk space at the cost of portability, latency, and redundancy.
Just wondering. Would anybody want a block devise backed by s3 or other object storage? Local cache with snapshots that can be rolled back ? Maybe giving you a exobyte of addrisable storage ?
This would be a actual block device not a fuse file system.
Yeah like iSCSI backed by the cloud but with local cache for recently used data. And the ability to roll the entire device back to a state in the past (depending on how big you want your S3 bill to be).
So, what all these filesystems do is having a local cache (in RAM or on SSD for better performance). The successors of NFS in the Linux kernel have been doing that all along.
Not at all interesting IMO. WebDAV and Nextcloud already work (just like Rsync). What's interesting is applying some kind of encryption on top of it. For that, I use Cryptomator [1] which also works on mobile devices.
I built something similar years ago when I had a laptop with a really tiny hard drive. I would move a bunch of files to a cloud server, and left a bunch of files with no data on the hard drive. Then I wrote a file system filter driver which would watch for the files opening, catch the open request, and quickly download.
It was better than walking around with an external drive. But it was super slow. After I upgraded the laptop, I had no further use for it.
Nice work!. I also made a similar file system Zbox: https://github.com/zboxfs/zbox. The difference is Zbox is a in-app file system focused on privacy, so FUSE is not supported intentionally. Although it already supports key-value store now, I am currently trying to extend its capability to cloud storage.
Zbox looks pretty interesting, thanks for linking it!
Have you done any performance testing to compare zbox vs xfs/ext4/zfs/whatever on the same system? I saw the benchmark in the readme, but that doesn't necessarily show how much overhead or loss of performance there is over the native filesystem.
Awesome idea! Recently I've been looking for a solution to automatically back up ~GBs of scraped data that is updated daily. Is this solution trustworthy enough? I was burned by OneDrive silently deleting data on a previous attempt.
The README states "Do not use in production" so I wouldn't trust it yet.
For ~GBs of data per day is it necessary to use something that avoids having a full local copy? I'd have thought you could have a full local mirror backed up with Dropbox, Backblaze, rsync, rclone, etc.
Pricing and reliability are my two main problems. I have access to 1Tb OneDrive but after it silently deleted 40Gb of irretrievable data I will never trust it again. Pricing wise Dropbox GDrive and others seem too expensive. Backblaze is the current frontrunner for sure, particularly with its deduplication facilities.
Zero has a cache where it keeps the most recently accessed files. For example, your latest 100GB of raw video recordings.
Does s3ql do that? I skimmed over their documentation and could not see it but maybe I didn't look long enough?
Worst case, could layer with encfs or equivalent. Be very careful to understand the exact threat model that covers (for starters, it leaves you painfully exposed to metadata issues), but it would work easily enough.
Very cool idea. I will be keeping an eye on this one. 1 tb of practical storage for 5 bucks a month. You could have a petabyte of storage for $5000 a month!
http://www.expandrive.com
Which operates with the same idea, up to a 10Gb local cache and will stream out larger files. Supports a huge number of storage back ends (Dropbox, Google Drive/GCS, Amazon S3/Drive, OneDrive, SFTP, etc). Also can pin files and trees into the cache.