Introducing the Infinit file system

deprave · on May 1, 2016

This looks really great and I love the direction. Hopefully they're working on making it more consumer-friendly. ;)

Questions:

1. They say that "While most solutions (Dropbox, GlusterFS, OwnCloud etc.) store your files unprotected in the cloud or on a specific server, we took a diametrically opposed direction by relying heavily on encryption. Whenever a file is stored in Infinit, it is cut into chunks, every chunk is encrypted with a unique key (AES-256) and stored, providing both encryption at rest and in transit."

Isn't that actually a lot like Dropbox? (https://www.dropbox.com/en/help/27)

2. How stable is FUSE for OS X? Last time I used it (couple of years) was for SSHFS and it was unstable and crashed the machine every now and then.

philjohn · on May 1, 2016

Dropbox deduplicate content ... so a file that several people have uploaded is only actually stored once.

They also allow you to access your files over a web interface, and no decryption happens "in browser". So yes, DropBox most certainly have the decryption keys. Your data is protected at rest, and on the wire using TLS, but it's not end-to-end encryption.

astrodust · on May 2, 2016

Do you have any reference to their de-duplication system?

michaelmior · on May 2, 2016

There is a blog post[0] which references deduplication. But apparently this may have been disabled[1].

[0] https://blogs.dropbox.com/dropbox/2011/07/changes-to-our-pol...

[1] http://webapps.stackexchange.com/questions/54633/does-dropbo...

hlieberman · on May 1, 2016

I think the big difference they're implying (though I haven't been able to actually /verify/ the implication yet) is that Dropbox holds the keys to decrypt that content. Infinit doesn't.

JadeNB · on May 1, 2016

> though I haven't been able to actually /verify/ the implication yet

Honest question: how would you verify such a thing? Even if you audit the code, once they open-source it, how do you know that what you see is the same as what they run on their end? (Or do you mean 'verify' just in the informal sense of having them state that that is what they mean, rather than in some more formal trust-free sense?)

wtbob · on May 1, 2016

> Even if you audit the code, once they open-source it, how do you know that what you see is the same as what they run on their end?

If it's actually secure, then the only thing which matters is what I run on my end, since the key would be generated on my end and the chunks would be encrypted on my end. If I send a server my data, then the system cannot be secure.

hlieberman · on May 1, 2016

The first seems more important (no sense going through all the work of verifying something is true before making sure the authors are actually stating it!), but, both, really.

In the latter case, you'd look at the software that /you/ run to validate that:

1. It's using some sane form of key generation. 2. It's actually using those keys to perform encryption on the files prior to them being transmitted. 3. The original files, as well as the key material, aren't read anywhere else in the code base not necessary to do the above two.

Then, only run that version of the software. It's non-trivial, and generally expensive. This is why a lot of people prefer tools which call out to other trusted programs to actually handle the key management and encryption. Less complicated code to audit that way. (cf tarsnap)

deprave · on May 1, 2016

I see, that wasn't clear to me, thank you!

ccrone · on May 2, 2016

Sorry that I'm late to this thread!

The first question appears to have been answered by others but I will add that even though Infinit is currently closed source, you can build your infrastructure without using the Hub. The Hub is just in place to make it easy to fetch user public keys, network and volume descriptors and endpoints at runtime. You can do this manually using the CLI (--export/--import instead of --push/--fetch).

We're using the 3.x branch of FUSE for OS X and haven't had an issue with it yet.

notacoward · on May 2, 2016

The part about GlusterFS storing files unencrypted is untrue, BTW. We've had full at-rest encryption using client-only keys for years before they came along. Makes me wonder how many of their other "original ideas" are just failure to do basic research before jumping in to make a buck.

nickpsecurity · on May 2, 2016

I was going to agree but decided to load up web site to be sure:

https://www.gluster.org/

I don't see a list of features at all here or an obvious link to them. Most sites have something like this (see "Why Sector/Sphere?"):

http://sector.sourceforge.net/

Or this:

http://lustre.org/about/

Wait, that one almost sucks as much as GlusterFS site. It could also use a page detailing exactly what it can and can't do with obvious link on front-page. At least it had obvious link and basic description, though.

So, hard to judge writer of recent post on what features were in GlusterFS, etc if their page doesn't go through the trouble to list them without us having to dig. I'd have to point two fingers. Fortunately, Wikipedia to rescue doing what site didn't do:

https://en.wikipedia.org/wiki/GlusterFS

Note: Oh shit, it still doesn't mention encryption. Not in the comparison or list of distributed filesystems either. Still undocumented for casual researcher.

notacoward · on May 2, 2016

Let me Google "GlusterFS encryption" for you.

http://www.gluster.org/community/documentation/index.php/Fea...

https://www.gluster.org/community/documentation/images/e/e2/...

...and more. I think that still falls under "casual research" ... besides which, implementers and promoters should be doing more than casual research anyway. The fact that the Gluster documentation sucks doesn't change that. Plenty of others have implemented such end-to-end encryption as well, and the trend is already increasing. (I find that quite gratifying BTW since I felt like a lone voice in the wilderness when I was writing about the importance of keeping keys on the clients five-plus years ago.) What they present as a differentiator is very much a me-too nowadays so, again, it calls their other claims of originality or uniqueness into question.

nickpsecurity · on May 2, 2016

Why would I Google GlusterFS encryption? You think a casual researcher should type every feature of importance into Google with an apps name to confirm if it has that feature? That's ridiculous. It's features or benefits should be easy to find on the homepage. Documentation, esp "What This Does," is critical to OSS success.

Here's what Gluster doc menu said: one that goes to instsllation page; more detailed installation; admin guide; developer guide; upgrade guide. Geez, I don't need any of that. Just want to know what the hell it does with what features.

They need to fix it.

notacoward · on May 2, 2016

If somebody is going to make a specific claim, in a published document, about another project having or not having a feature, then damn right they should Google for the combination. A lot of OSS projects end up having pieces of information scattered all over presentations and blog posts and who the hell knows what else. All of your goalpost-moving about what should be on the Gluster website doesn't change the fact that Infinit's characterization was inaccurate and trivially revealed as such by a single obvious Google search.

nickpsecurity · on May 2, 2016

You're semi-right here. People saying why their offering is better than competitors better know what the competition offers. Marketing 101 says dig deep to find that plus differentiators. That much I agree with.

That said, your excuse for GlusterFS site being screwed up is that other projects are screwups, too. Makes no sense. Let's put it into perspective: Gluster people could spend under 5 minutes typing up and posting that page. Instead, they expect all potential users or contributors to spend 10m-1hr digginh through docs for same information. Meanwhile, many FOSS pages state clearly what their software does.

So, no excuses. It's just laziness and foolish on top of that given they want more adoption of a tool they won't describe haha. The responsibility is on them to present their work in a clear way given that goal. They're failing on that right now.

notacoward · on May 2, 2016

It's easy to criticize when you've never worked on the kind of infrastructure that Gluster has. Yeah, for a project I'm doing all by myself it would take all of five minutes to update the web page with that sort of information. Hey look, that's exactly how it happened when this was part of a project I was doing myself.

http://pl.atyp.us/hekafs.org/

Now, on a huge project with the website and docs with separate owners subject to their own detailed standards and review processes, it's not that easy. Even as one of the Gluster project architects, it would take me more than a few minutes to make even a trivial change, and that would be time taken away from other tasks for which my personal expertise is even more necessary. Yes, it's a problem - one shared to varying degrees by most projects of this scale. Here's the fantastically informative website for the Linux kernel.

https://www.kernel.org/

They don't have links to specific features on the front page, either. Good luck finding the docs about XFS project quota, starting from there. Would it be responsible for me to claim it doesn't exist, based only on what I can get by clicking around on kernel.org? I could come up with similar examples for many features on Apache or Mozilla or OpenStack projects as well. The fact is that technical debt exists for documentation as well as for code, and keeping it all coherent becomes exponentially more difficult as the project grows.

But none of that has anything to do with the original point. What's the point of complaining about Gluster here? How could you possibly believe that's helpful, unless you believe people should be grateful for every moment of your attention? This story is about Infinit. Let's keep the focus on them, and their claims, and whether those claims are accurate.

nickpsecurity · on May 3, 2016

"It's easy to criticize when you've never worked on the kind of infrastructure that Gluster has. "

It's easy to critize if I've ever done one essay, research paper, marketing piece, commercial project... many things where I took the time to document and explain what I was presenting. Often in an ad hoc was as it was a secondary focus. I don't have to work on a specific product to understand a common skill or requirement.

"Here's the fantastically informative website for the Linux kernel."

"based only on what I can get by clicking around on kernel.org? I could come up with similar examples for many features on Apache or Mozilla or OpenStack projects as well. "

You cite an example from a group that famously doesn't give a damn what people think which also expect people to know what their project is already. Also an outlier in general. Then some others whose sites at least explain what they do with key features. Unlike Gluster. You grasped at straws more than I expected.

"Hey look, that's exactly how it happened when this was part of a project I was doing myself."

You did exactly what I'm asking them to do. More actually. Your site supports my position albeit half the links in the paper/slides set didn't work. Maybe it's NoScript but one or two told me what I needed to know. How about some examples of how easy it is on big, collaborative projects:

https://www.freebsd.org/about.html

http://www.xtreemfs.org/all_features.php

http://toolkit.globus.org/toolkit/about.html

https://owncloud.org/features/

(OwnCloud since the OP blog post mentioned it as a competitor.)

"But none of that has anything to do with the original point. "

Your original point was that the author should've been responsible enough to do research, identify key features of Gluster, and consider them before making claims in the post. You also argued against Gluster site author(s) needing to be responsible enough to identify the key features and list them on the website for people like author doing research. You then made excuses for them despite that being quite easy and considered good practice. So, I'm addressing both the poor research by Infinit author and poor docs by Gluster that contributes to it, along with your double standard on the topic. I'm adding counter-examples to sites in and outside their domain with varying organizations and team sizes to show how they could improve.

Had they had good materials to draw on, I'd have never argued with your claim that author should've known what features they do or don't have. Or maybe if you slammed them for irresponsibility as you did the author. Seemed to be a bias, though, along with OSS documentation problem worth noting.

fezz · on May 2, 2016

LTFS is based on FUSE and has support from all the major tape vendors. It's been pretty stable so far but does have some speed limitations.

kefka · on May 1, 2016

     1. Protocol level filesystems featured by a company (single point of failure)
     2. Uses existing services in ways those other companies may not like. (Serf in someone else's walled garden)
     3. Closed-source, with maybe promises of open sourcing later
     4. Hard to search, given the name is -e from a real word
     5. Relies on extensive network bandwidth, especially in cases upload is scarce

To think of a few.... As converse, I'm looking at IPFS. It works great, now.

fiatjaf · on May 1, 2016

IPFS serves a totally different purpose and can't be used in any reasonable way to achive what Infinit seems to be doing.

bfung · on May 2, 2016

Can you elaborate, please? My initial reaction was the same as grandparent post - maybe IPFS doesn't have all the security things stated up front, but these things seem similar.

rakoo · on May 2, 2016

2 differences:

* infinit provides a virtual filesystem that you can use with any other application. At the moment IPFS only has helpers and an api

* infinit's goal is to use the existing infrastructure (your disk, your S3 account, your google drive account, ...) to form a big drive. IPFS lives in its own world, which is great when everybody uses IPFS, but we're not there (yet ?)

kefka · on May 2, 2016

"ipfs mount" is certainly a command, and it mounts the ipfs filesystem over /ipfs and /ipns . And I've already used those in conjunction with programs. The ipfs mount directive has been there since 0.3.1

I don't view smearing your data across multiple fragile services to be a "feature". Worse yet, if any of those companies believe you're breaking ToS, you're out of luck.

Whereas ipfs does live in its own world, it integrates well with our workflows. And http://ipfs.io runs a public ipfs gateway, so that anyone can resolve data from the network without running the peer software.

mycure · on May 2, 2016

Infinit's goal is not to compete against IPFS which is focusing on providing a protocol for distributing content.

IPFS does not focus on providing redundancy, fault tolerance, rebalancing or file-level functionalities such as access control, versioning etc. This is what Infinit is doing.

Two very different solutions even though they may share some technical similarities.

kefka · on May 2, 2016

I'm not sure what IPFS you're reading about....

redundancy - Any node in the IPFS network can provide the requested data. That's because the identity of the data is with the name, and not the server you got the data from.

fault tolerance - Absolutely does provide fault tolerance. The filesystem is a SHA256 hash in what they call a multihash. Because everything has a hash-name, the file system is a self-certifying filesystem.

Rebalancing - Doesn't make sense. You need more capacity or bandwidth? Add more machines, and pin the data you need.

Access control - file-level encryption is part of the protocol, but not implemented yet. The idea is that GPG can serve as the go-between until ipfs encryption subroutines are implemented.

Versioning - similar, in the spec but not added yet. There will be 2 types of versioning; blockchain and git-style

mycure · on May 2, 2016

I'm reading what I can find :) Sorry if I missed information, it was not my intention to say that IPFS was not good, just to say that the purpose was different from Infinit's.

Redundancy: https://github.com/ipfs/ipfs/issues/50. One of IPFS' core developers clearly states that there is no redundancy in IPFS. I took my information from there because it was the only one I could find with Google. It is not because one node can retrieve the data that the data is actually stored multiple times and that the storage servers coordinate to maintain consistency. From what I understand, IPFS content-hashes information (as Infinit does) but does not replicate it. As such, if a server goes down, its hosted content becomes unavailable; very much like the Web. This is not the case of Infinit (if you defined a replication factor above 1). Again, different purposes, my goal is not to say one is better; redundancy obviously has a cost.

Fault tolerance: Yes nodes can go down in IPFS but the system will not function as before as some data may be unavailable, in the worst-case scenario, permanently, should the failure be fatal. Sorry, I should have made myself clear. IPFS is fault-tolerant but does not ensure availability. Bittorrent as well. Some files may become so rare that you cannot access it anymore. Infinit ensures that all the files are available at any time. Self-certifying is another concept altogether.

Rebalancing: I don't understand exactly what you mean but if you want to provide a POSIX-compliant reliable (available/durable) file system, you need fault tolerance and rebalancing, which is to recreate missing replicas and possibly move data around as servers are added or fail. If you can't do that, then again you have the problem of potentially having unavailable pieces of information. It is fine for some systems such as the Web (not critical if you can't access some rare pages), but it is super critical for an enterprise file system in production. Again, different purposes I believe as IPFS (from what I understand, sorry if I missed something) is trying to provide a HTTP replacement: a new peer-to-peer hypermedia protocol (from http://ipfs.io).

Access control: File-level encryption is one thing but access control something else entirely: how to allow other users to read/write files and how to manage the keys. What about removing files? What about groups? Subgroups? And what about integrating into an enterprise directory (LDAP)? These compose a complete set of access control capabilities from my point of view. File-level encryption is obviously the basis for access control but is not enough. As you stated, it is planned in the protocol.

Versioning: Good to know that IPFS will be providing versioning, I didn't know that. Thanks for pointing that out.

To conclude, both projects seem to rely on some similar tech such as content hashing. IPFS seems to be going the way of a protocol for accessing data in a decentralized way (like the Web in a more modern way) but without providing redundancy, consistency and availabilities garantees. Storj is going this way for instance, providing a peer-to-peer object store (S3-like). Infinit however seems to be more focused on providing a POSIX-compliant peer-to-peer, reliable, secure and fault-tolerant file system for Ops and DevOps.

bfung · on May 2, 2016

Thanks for the in-depth analysis, it's super helpful.

My understanding is that IPFS is bittorrent + blockchain mashed together (I've read the tech specs, it's more complicated but will have to suffice as a explanation device here). The bittorrent part certainly has redundancy when the data is widely accessed, it's just not guaranteed to be.

From a high level, it seems that Infinit has just more stated features at the moment, but it does look like the two technologies are in the same space and can solve similar problems of file storage in a large, distributed manner.

disposeofnick9 · on May 2, 2016

How would this be better/different than https://tahoe-lafs.org/trac/tahoe-lafs ?

cookrn · on May 2, 2016

Here is their comparison with Tahoe: https://infinit.sh/documentation/comparison/tahoe-lafs

If you flip on the "Compare" switch there, the two differences it shows are:

1) Infinit is redundant using replication while Tahoe uses erasure codes

2) Infinit supports heterogenous storage backends, while Tahoe is homogenous

meesterdude · on May 1, 2016

This looks awesome!

I can't find anything in the docs on data resiliency - if i add a storage node and it later becomes unavailable, is data distributed across other nodes? how many nodes can fail?

cadeuh · on May 2, 2016

Hey, it all depends on the replication factor you set when creating your infrastructure. If you have 3 nodes, a replication factor of 2 and one of the node used for storing data becomes unavailable, then data will be distributed to the third node for example.

Feel free to join our Slack channel to chat with us directly!

haarts · on May 1, 2016

The roadmap seems to suggest at least some of your questions are answered: https://infinit.sh/documentation/roadmap

fiatjaf · on May 1, 2016

See also: https://bazil.org/, https://git-annex.branchable.com/

mitchty · on May 1, 2016

Thought this looked familiar, this is from February and they still have the source as "coming soon (tm)".

So until that coming soon becomes released I'm not really all that bothered by this thing.

_pfxa · on May 1, 2016

It's 2nd may today man.

mitchty · on May 2, 2016

Sure, still doesn't give me any hope as to when/if it will get released.

cookrn · on May 2, 2016

Note: Infinit is not yet OSS. They plan to open-source the code, but have not yet done so: https://github.com/infinit/infinit/issues/1

jcoffland · on May 2, 2016

This sound really cool but I will wait until it is fully Open-Source and I can build it myself.

yashinm92 · on May 2, 2016

I did create something similar a while back: https://github.com/sp3ctr3/arcanum-server https://github.com/sp3ctr3/arcanum-client

WorldMaker · on May 2, 2016

This seems like an interesting project to watch and I'd be interested more in it as A) Windows support gets better, and B) source is opened.

I'm definitely comparing to my usage of BitTorrent Sync today. I see in the FAQ a comparison for BT Sync a few things of interest.

BT Sync does have a more filesystem-like mode. (It's a part of BT Sync "Pro", if that makes a differences.) It's also scalable to available resources, although some of that through manual management of which devices are connected to a BT Sync share and which have which copies of which files.

Also, BT Sync Pro does support At-Rest security with some management. A UX for encrypted shares was added in recent versions of the software and it supported through some command line effort in previous versions. I've seen tutorials for setting up BT Sync "know nothing peers" on, for instance, EC2 storing encrypted blocks to S3.

TheIronYuppie · on May 1, 2016

Really cool! Do you know how it compares to the research project Farsite from Microsoft? http://research.microsoft.com/apps/mobile/showpage.aspx?page...

Sounds really similar (but productized).

Filligree · on May 1, 2016

Really cool, but I use NixOS for everything. You mentioned it's open source so--want to help me package it?

It's probably not very hard.

ccrone · on May 2, 2016

It's not open source just yet. We want to open it in a responsible way (i.e.: not a massive code dump) so it will take a bit of time.

You can post a request for packages here in the meantime if you would like: http://infinit-sh.uservoice.com

acd · on May 1, 2016

Thanks for creating Infinit! I think its great that you have end user usability in mind when building the filesystem.

tingol · on May 1, 2016

Is there any information on how good it works in the real world by people who have used it?

wazoox · on May 1, 2016

Works well from the limited experimentations I've made. Very well documented, nice, professional job overall.

sigmonsays · on May 1, 2016

my bandwidth kills things like this. I have so little upload in the US that I can't host anything and still have a usable network connection

chockablock · on May 1, 2016

If upload interferes with download, you may be experiencing bufferbloat. http://www.dslreports.com/faq/17883

toomuchtodo · on May 1, 2016

You could get a virtual machine somewhere cheap for $5-10/month and use that as your canonical storage reference.

toomuchtodo · on May 1, 2016

Reply to myself since I cannot edit: Depending on how much storage you need, it might be cheaper to get a dedicated server or colo your own versus getting a virtual machine. Feel free to reply if you have questions on this.

DecoPerson · on May 1, 2016

Any advice on getting either of those in Australia for less than an arm and a leg?

fapjacks · on May 2, 2016

Ramnode has a bunch of locations and is hands-down the best VPS provider I've ever used, and I've used literally hundreds over the last fifteen or so years. The vast majority are a flash in the pan, and very few actually don't suck. Ramnode has always been super solid for me, and very reasonably priced. Take every other recommendation with a grain of salt. VPS is one of those industries that attracts the scummiest human beings on the planet.

viraptor · on May 2, 2016

I don't know about colos, but https://www.vultr.com/ has presence in Sydney. They offer both VMs and block storage at reasonable prices.

toomuchtodo · on May 2, 2016

Start here: http://www.webhostingtalk.com.au/forumdisplay.php?f=36

cyphar · on May 2, 2016

Move to a building that has the NBN. ;)

lootsauce · on May 1, 2016

Looks cool but I get a bad gateway error when I try to set up an account :-(

cadeuh · on May 2, 2016

Hey! Can you send us an email at contact@infinit.sh so we can have a look? We fixed an issue similar to this just a few hours ago, so it might be working now.

bicatali · on May 1, 2016

onedata.org is another one of these unified file view used in science.

matreyes · on May 1, 2016

Looks great! Is there any option to do some Map Reduce jobs on Infinit ?