Show HN: Zero – Local file system transparently swapping to the cloud

hemancuso · on Sept 10, 2018

Very cool project! Close to my heart. I write ExpanDrive

http://www.expandrive.com

Which operates with the same idea, up to a 10Gb local cache and will stream out larger files. Supports a huge number of storage back ends (Dropbox, Google Drive/GCS, Amazon S3/Drive, OneDrive, SFTP, etc). Also can pin files and trees into the cache.

sz4kerto · on Sept 10, 2018

Bought a license maybe 4-5 years ago because the website said that Linux support is coming very soon. It's been coming since then. Tried to sign up for the beta but didn't even get an answer. :(

dpacmittal · on Sept 10, 2018

Rclone can mount 20+ backends and has caching and mounting support. I guess you can do something similar with Rclone. Plus it's FOSS.

hemancuso · on Sept 10, 2018

Originally said "coming soon" to see if anyone was interested. Turns out they are!

gdfasfklshg4 · on Sept 10, 2018

Can you explain a little more what you mean by this?

pvinis · on Sept 10, 2018

It means that is was not actually "coming soon". It said that, just to see if many people are interested in that feature. If there were many people interested, development of that feature would start. If not many people were interested, I guess the text would have been just removed. Makes sense?

arthurcolle · on Sept 10, 2018

Pretty convoluted to me...

konschubert · on Sept 10, 2018

No, it's simple, it's just a lie.

(Sorry, don't mean to attack anyone or put too much blame, but let's call it what it is.)

hemancuso · on Sept 11, 2018

Technically the lie really only extends to the "soon" part, since it is coming. And software regularly has a list of future features that often don't ship. Would love to chat if you were ever interested!

konschubert · on Sept 11, 2018

Hi, I am always up for a chat, my email is mail@konstantinschubert.com. :)

To get back to our argument: By the time you said "coming soon" you didn't know if it was coming at all, so I would call that a bit of a lie.

giancarlostoro · on Sept 10, 2018

But at least they've been working on it for a good minute:

https://twitter.com/expandrive/status/1000012200612950017

hemancuso · on Sept 10, 2018

We took a beta list to gauge interest. No sense in all the work if nobody cares.

ktm5j · on Sept 10, 2018

Yeah but why not just ask if people were interested? Instead of saying something that wasn't true..

tekromancr · on Sept 10, 2018

Because everyone and their dog would say they want it, or are interested in it, even if they don't actually want it. This puts up a (small) barrier that helps gauge true interest.

It's the same idea as setting up a landing page for a product that doesn't exist and seeing if people click "pricing" or "buy now" to determine if there is a market for it.

dflock · on Sept 10, 2018

Anywhere I can sign up to be notified when the Linux version is out?

hemancuso · on Sept 10, 2018

linux@expandrive.com

konschubert · on Sept 10, 2018

Why is the local cache limited to 10 GB?

I am building Zero to store my personal pictures and videos and I feel like 10GB local cache means that only the last month of pictures and videos is local, which seems very limited.

Are there any technical reasons why the cache cannot be 500GB?

hemancuso · on Sept 10, 2018

It used to be unlimited but most of the users ultimately desired to offload data and access it on demand. It should be configurable but currently isn’t.

geekone · on Sept 10, 2018

Very pedantic but I think "Built with in Boston." is intended to be "Built in Boston." in the footer?

hemancuso · on Sept 10, 2018

There is a red heart after "with" - or at least there is on every machine I've tried.

geekone · on Sept 10, 2018

Script blocker bit me there, you are right :). Looks like a very interesting project!

amelius · on Sept 10, 2018

Based on "I write ExpanDrive", I thought the author was Chinese.

hemancuso · on Sept 10, 2018

A bit rude. I like that phrasing.

drtyolmck · on Sept 10, 2018

I use expandrive everyday. Thanks man.

kyrra · on Sept 10, 2018

Is that similar to the Drive FS client then (except it's obviously multi-cloud and backend)?

https://support.google.com/a/answer/7491144?hl=en

deadonarrival · on Sept 10, 2018

Any progress on the Linux version?

hemancuso · on Sept 10, 2018

https://twitter.com/expandrive/status/1000012200612950017

Yup!

rosege · on Sept 10, 2018

Does everything get stored in the cloud - even files that are in the 10Gb local cache?

hemancuso · on Sept 10, 2018

Yes, unless you're offline - in which case it stays in the cache until a connection is available.

Brajeshwar · on Sept 10, 2018

Happy customer of Expandrive for quite a while. Got a lifetime license.

danieldk · on Sept 10, 2018

I should also switch to the lifetime license sometime, Expandrive is indeed really an awesome product!

hemancuso · on Sept 10, 2018

Thanks!

wumms · on Sept 10, 2018

Sounds good. At what stage would I need to buy the license? I could not find any info on that.

toomuchtodo · on Sept 10, 2018

“ExpanDrive runs unlicensed and fully featured for 7 days giving you a chance to try everything out. Once the trial period expires you’ll need to buy a license or you will be limited to 20 minutes of use.”

konschubert · on Sept 10, 2018

Author here. Very glad to see that nobody has noticed how slow it is and nobody has made a comment about how messy code is in some places :D

Working on both issues.

But first I will add some instructions on how to run it.

ohiovr · on Sept 11, 2018

I think you should try funding your project by encouraging users to follow an affiliate link to buy storage from backblaze.

Tloewald · on Sept 10, 2018

Interesting idea. But is the cloud version a complete image (perhaps out of sync)? If so then it’s a performance disaster, if not it’s very fragile.

It seems to me what we really want is a cloud file system with local cache (like Dropbox or iCloud conceptually) so that if our local device is vaporized we have a pretty much up to date logical store alive and well (and we can work on any number of machines). The word “swapping” seems to me to be based on the virtual memory model which means that if anything goes wrong you have two disconnected piles of crap.

At a file level you could theoretically have a giant file that is never wholly local, but how useful is this as a feature in real terms?

nine_k · on Sept 10, 2018

I think Borg or Tarsnap use the right approach here: a map of blocks, updating a file updates only the changed block(s). It balances the efficiency of updates and the completeness of the copy. Sort of like FAT filesystem, only with block-level deduplication built in.

Of course you don't get a nice mirror of your files right in the cloud, unless you run a separate server that reconstructs it and makes available as traditional buckets.

h1d · on Sept 10, 2018

restic and duplicacy are the newer implementations of block level dedup encrypted backup.

From what I tested, restic has friendlier command line options but duplicacy is technically superior at this point (restore works way faster)

mappu · on Sept 10, 2018

Restic's restore isn't parallelized at all, whereas its backup is. It should be straightforward to improve the restore performance.

https://github.com/restic/restic/pull/1719

DEADBEEFC0FFEE · on Sept 10, 2018

I use a Rubric appliance, that does block level dedupe and extends to cloud. I was able to instantiate a multi TB db, from the backup to a physical server in minutes. Extremely impressed .

konschubert · on Sept 10, 2018

I decided against a block- level system with Zero because I'm trying to make predictions about which files will be needed next locally and that's hard on a block level, I think.

AmrMostafa · on Sept 10, 2018

I am wondering if there is a backup solution that works that way but without requiring a manual time consuming invocation.

Using something like inotify to record changed files and a worker in the background to immediately sync. Like dropbox.

hemancuso · on Sept 10, 2018

http://www.arqbackup.com

konschubert · on Sept 10, 2018

Yes, the cloud version is a complete image (without the file names though) that should be eventually consistent.

And yes, performance is a disaster right now simply because the code is not optimized at all. But the sync to the cloud happens in the background so it should not affect your performance unless you have a "cache miss".

natmaka · on Sept 10, 2018

Isn't it a fusion of HSM https://en.wikipedia.org/wiki/Hierarchical_storage_managemen... and continuous backup?

What about often-locally-changed data which are part of a coherent set, the classic case being a file used by a database engine to store data? We nearly always need to mirror/backup a consistent version of it (just after a successful nesting transaction, in the SQL world the upper-level "COMMIT"), but AFAIK for the time being the HSM+backup software cannot detect such a state. trapping existing system calls (fsync and co, in order to copy to the remote storage data in a sync'ed state) but this is not robust because their semantics is not "upon return of this call the whole dataset (in all files) is consistent".

Moreover if the application using the DB engine is not perfect such inconsistency may reside at application level => after a COMMIT the file is consistent for the DB engine, but not for the application.

I wonder if some users of such HSM+backup software felt some major disappointment after restoring an inconsistent version of such a file. Even a minor loss (garbled index) may be hard to detect and lead to a "fork" of the data.

A dedicated system function called to signal "in my set of opened files the data are consistent" would be useful but is AFAIK missing, and even if someone adds it to some libc/kernel it will only be useful when the application code will actually call it.

The kludge is a procedure "order to engine to sync the data ; throttle the engine in 'no write mode' ; create a RO snapshot ; backup the snapshot; unthrottle the engine ; delete the snapshot", which seems not exactly "transparent".

toomuchtodo · on Sept 10, 2018

In such a case, you’re better off with a database engine that streams its journal or transaction log to an object store.

Don’t perform data operations at the wrong layer.

natmaka · on Sept 10, 2018

Indeed, and this is my point: such tools cannot be generic ("works with any file") and also transparent ("plug & play").

pjc50 · on Sept 10, 2018

Yes, but those are the preconditions to user adoption.

konschubert · on Sept 10, 2018

Author here. Thanks for the Wikipedia link. I think that the software is trying to implement HSM but I didn't know that this is what it's called.

With Zero, all local data is eventually synced to the cloud but usually this only happens after the local file is idle for a while.

mkl · on Sept 10, 2018

I've been using SyncThing [1] recently, which does a similar thing but between your own devices (anything from Android to desktop to servers in the cloud). I've been using it on Linux and Windows, and it seems pretty good.

[1] https://syncthing.net

luhn · on Sept 10, 2018

I don't think Syncthing is the same. As far as I recall, it stores the full sync folder to disk, similar to traditional sync services. This project only stores a cache on the disk.

mkl · on Sept 10, 2018

You are correct. With SyncThing each device gets a full copy of whatever folders you've asked it to sync there. The FUSE aspect of Zero lets it do just-in-time file transfer and have no solid upper limit on storage, saving bandwidth and disk space at the cost of portability, latency, and redundancy.

burkemw3 · on Sept 10, 2018

I built syncthingfuse to do partial syncs. Currently unmaintained, though.

https://github.com/burkemw3/syncthingfuse

ArtWomb · on Sept 10, 2018

Gcloud also has a FUSE adapter for cloud storage buckets

https://cloud.google.com/storage/docs/gcs-fuse

mbrumlow · on Sept 10, 2018

Just wondering. Would anybody want a block devise backed by s3 or other object storage? Local cache with snapshots that can be rolled back ? Maybe giving you a exobyte of addrisable storage ?

This would be a actual block device not a fuse file system.

konschubert · on Sept 10, 2018

I was actually thinking of doing this as a block device instead of a file system and I agree that it may be the more "natural" solution.

However, the software tries to make predictions for which files will be accessed and doing this on a block level would be much harder.

Plus, I had to learn fuse to write this and I thought I'd start off easy :D

_jcwu · on Sept 10, 2018

You mean like iSCSI in the cloud?

mbrumlow · on Sept 10, 2018

Yeah like iSCSI backed by the cloud but with local cache for recently used data. And the ability to roll the entire device back to a state in the past (depending on how big you want your S3 bill to be).

Fnoord · on Sept 10, 2018

So, what all these filesystems do is having a local cache (in RAM or on SSD for better performance). The successors of NFS in the Linux kernel have been doing that all along.

Not at all interesting IMO. WebDAV and Nextcloud already work (just like Rsync). What's interesting is applying some kind of encryption on top of it. For that, I use Cryptomator [1] which also works on mobile devices.

[1] https://cryptomator.org

swalsh · on Sept 10, 2018

I built something similar years ago when I had a laptop with a really tiny hard drive. I would move a bunch of files to a cloud server, and left a bunch of files with no data on the hard drive. Then I wrote a file system filter driver which would watch for the files opening, catch the open request, and quickly download.

It was better than walking around with an external drive. But it was super slow. After I upgraded the laptop, I had no further use for it.

Shorel · on Sept 10, 2018

This is one of the reasons I use pCloud. It adds a huge disk device that is actually remote and only the recent files are locally cached.

The other reason is Linux support.

ptman · on Sept 10, 2018

Lifetime prices for online services? Sounds really sketchy.

adrianN · on Sept 10, 2018

Lifetime of the service perhaps.

burmecia · on Sept 10, 2018

Nice work!. I also made a similar file system Zbox: https://github.com/zboxfs/zbox. The difference is Zbox is a in-app file system focused on privacy, so FUSE is not supported intentionally. Although it already supports key-value store now, I am currently trying to extend its capability to cloud storage.

johntash · on Sept 10, 2018

Zbox looks pretty interesting, thanks for linking it!

Have you done any performance testing to compare zbox vs xfs/ext4/zfs/whatever on the same system? I saw the benchmark in the readme, but that doesn't necessarily show how much overhead or loss of performance there is over the native filesystem.

burmecia · on Sept 10, 2018

No, I didn't. And I don't think it is necessary as Zbox is much more like an 'application-level' fs, obviously can't match the system level fs.

quantumwoke · on Sept 10, 2018

Awesome idea! Recently I've been looking for a solution to automatically back up ~GBs of scraped data that is updated daily. Is this solution trustworthy enough? I was burned by OneDrive silently deleting data on a previous attempt.

tempay · on Sept 10, 2018

The README states "Do not use in production" so I wouldn't trust it yet.

For ~GBs of data per day is it necessary to use something that avoids having a full local copy? I'd have thought you could have a full local mirror backed up with Dropbox, Backblaze, rsync, rclone, etc.

quantumwoke · on Sept 11, 2018

Pricing and reliability are my two main problems. I have access to 1Tb OneDrive but after it silently deleted 40Gb of irretrievable data I will never trust it again. Pricing wise Dropbox GDrive and others seem too expensive. Backblaze is the current frontrunner for sure, particularly with its deduplication facilities.

konschubert · on Sept 10, 2018

Please don't use it in prod, this is work in progress.

_jcwu · on Sept 10, 2018

I would recommend you borg (has deduplication) + rclone.

__adrien · on Sept 10, 2018

Reminds me of https://github.com/Azure/azure-storage-fuse

The code is good, maybe you'll find inspiration there.

tader · on Sept 10, 2018

Why would one prefer this over s3ql (https://bitbucket.org/nikratio/s3ql/)?

konschubert · on Sept 10, 2018

Zero has a cache where it keeps the most recently accessed files. For example, your latest 100GB of raw video recordings. Does s3ql do that? I skimmed over their documentation and could not see it but maybe I didn't look long enough?

tader · on Sept 10, 2018

"S3QL splits file contents into smaller blocks and caches blocks locally." -- http://www.rath.org/s3ql-docs/about.html#features

It allows you to configure an arbitrary cache size, I've been using it with 60GB local cache.

konschubert · on Sept 10, 2018

But are writes and reads really 100% local or do they require synchronous networking?

starkruzr · on Sept 9, 2018

I can't see how this is used. Can we get a "here's how to set up the file system" guide somewhere? I see the bit about the config file. Then what?

I'd primarily like to use this to back up a couple of Proxmox hosts.

konschubert · on Sept 10, 2018

Yes, I'll add a guide and more description but please don't use it yet, it's work in progress.

hilyen · on Sept 9, 2018

Not seeing, but does it support encrypting files/folders on cloud storage?

yjftsjthsd-h · on Sept 10, 2018

Worst case, could layer with encfs or equivalent. Be very careful to understand the exact threat model that covers (for starters, it leaves you painfully exposed to metadata issues), but it would work easily enough.

konschubert · on Sept 10, 2018

My plan is to use it with fusecrypt for now and eventually include encryption directly.

fazilakhtar · on Sept 10, 2018

Keybase does something similar with their file system kbfs(which is encrypted), they use fuse too.

Local caching is limited to 10% of your disk space (if I remember correctly).

Cool project though. Will definitely keep it in my radar.

konschubert · on Sept 10, 2018

Does keybase sell could storage packages?

Btw, I really love the idea of keybase, I hope they take off.

fazilakhtar · on Sept 18, 2018

No, they have 250GB per user. IMO that is good for backups for now till they maybe start selling more storage?

Same here, they're doing some really good work.

corybrown · on Sept 10, 2018

Been using rclone mount for something similar, curious how this compares

konschubert · on Sept 10, 2018

I think with rclone you need as much local space as you take up in the cloud?

dpacmittal · on Sept 10, 2018

Rclone has a cache backend in the latest version which works exactly like OP.

konschubert · on Sept 10, 2018

Cool! Do you have a link?

ohiovr · on Sept 11, 2018

Very cool idea. I will be keeping an eye on this one. 1 tb of practical storage for 5 bucks a month. You could have a petabyte of storage for $5000 a month!

j88439h84 · on Sept 10, 2018

rclone.org is a great tool for syncing to/from cloud storage.

mirceal · on Sept 10, 2018

are there any plans to support S3?

konschubert · on Sept 10, 2018

Hi, the Back-Ends are in principle pluggable so I'm very happy to incorporate a PR for an s3 Back end. I may also do it myself eventually.

I just find s3 very expensive for long term storage of personal data.

h1d · on Sept 10, 2018

They should support providers such as Wasabi where they have unlimited egress to feel safe with a fixed pricing (per GB).

alexnewman · on Sept 10, 2018

Minio can do disk caching

balkierode · on Sept 10, 2018

NFS again?