Hacker News new | past | comments | ask | show | jobs | submit login
How Sandstorm Works: Containerize data, not services (sandstorm.io)
112 points by paulproteus on Jan 20, 2016 | hide | past | favorite | 21 comments



This is very interesting. It combines a number of older ideas. Even the core idea behind their service, IIRC, existed in commercial products and academic research at various times. The security model looks like how MILS Architecture systems were describes for servers combined with capability work. I also like that they've heard of and use PowerBox's. :)

Worth watching or following up on later maybe.


Thanks!

Indeed, there have been a bunch of promising capability research projects that never quite made it to production. Sandstorm's Cap'n Proto is based on Mark Miller's E language and CapTP protocol, while the Powerbox concept derives from Marc Stiegler's CapDesk (although of course many production systems contain narrow-purpose powerboxes "by accident"). Both MarkM and MarcS are friends of the project and have provided review and advice.

Sandstorm, though, is not a research system. I like to think we have been able to make capabilities practical by being willing to step away from the purist philosophy when it makes sense. E.g. CapDesk required software to be written in E (IIRC), meaning the world had to be rewritten from scratch, which simply wasn't going to happen. Sandstorm compromises by allowing legacy native code to run in fine-grained containers. As a result we are able to deliver real, useful applications to thousands of users today. :)


"Indeed, there have been a bunch of promising capability research projects that never quite made it to production. Sandstorm's Cap'n Proto is based on Mark Miller's E language and CapTP protocol, while the Powerbox concept derives from Marc Stiegler's CapDesk (although of course many production systems contain narrow-purpose powerboxes "by accident"). Both MarkM and MarcS are friends of the project and have provided review and advice."

Oh I'm more than aware of these projects and people. That you're getting advice and approval from the masters of this sort of thing is a hell of a differentiator in a space where most security claims are almost totally BS.

"Sandstorm compromises by allowing legacy native code to run in fine-grained containers."

What's your hypervisor and container model? I mean, how do you do the sandboxing and execution of native code? That's first. Then, what virtualization platform to have a guestimate on TCB size and difficulty to secure.


Linux namespaces with aggressive seccomp and other attack surface reduction.

https://docs.sandstorm.io/en/latest/using/security-practices...

Linux has CVEs all the time, but our approach seems to mitigate most of them:

https://docs.sandstorm.io/en/latest/using/security-non-event...

I suspect you'll find those two pages answer many of your questions. :)


Appreciate it. I didn't realize who you were or that Sandstorm was a crowd-funded FOSS project until now. Just read through your site a bit, including advisory response to Ben Laurie. Nice touch, there. ;)

Anyway, overall results give me a good, default impression of this work. I'm semi-retired from high-assurance security but am considering taking on some OSS projects or contributions. I'll add this one to list of possibilities as you're one of the few doing quite a few things right. No promises as I'm often a procrastinator or trying to do too many R&D activities at once to commit to a codebase.

Except the sandboxing scheme and endpoint. The usual. (sighs) That's OK, though, for now to get adoption and testing of your software. Robust implementations on separation kernels or whatever can come later if it proves worthwhile. Just try to keep it portable, at least not too dependent on Linux model or toolchain specifics. That will help if someone decides to raise assurance level by porting it to high-security tech. It's damn near impossible for a lot of modern software that gets too clever with platform-specific stuff.

Not sure I could do it in C++, anyway, as I don't remember that language. Was too complex for high assurance. Main idea was to apply something like the Nizza Security Architecture and Softbound + CETS to it w/ extra attention to input validation. LANGSEC has a parsing system, too, so maybe someone could integrate their techniques with your middleware. Quite a few possibilities even with minimal modifications.

Note: Also, as you're already looking at syscalls, there is an old trick I used to use and which Poly^2 independently invented where you straight up rip sys call or optional functions out of the kernel code. Just put a return 0 or something similar in stuff you'll never use. Ditto with userland although you can just remove whole components or MAC them most of the time. Makes system leaner, too. I ran stuff on non-Intel, unpopular processors while removing evidence that I was doing that for added hacker frustration. ;)


If you'd like to get involved with Sandstorm, we have this fancy new page to help with that! :)

https://sandstorm.io/community

If you're in the Bay Area, come to our SF meetup next week. My teammate Drew will be talking about his work implementing the Powerbox UI in Sandstorm. (I'll be there too, of course.)

http://www.meetup.com/Sandstorm-SF-Bay-Area/events/227595644...

(FWIW we also now have meetups in NY, Boston, Zurich, and Berlin... http://sandstorm.meetup.com)


Hmm, it's hard to put faith into a system that spells its own name wrong on its home page: http://erights.org/elib/distrib/captp/index.html


Then put faith in the fact his prior deliverable, the DARPAbrowser, got favorable reviews during security evaluation sponsored by DARPA. Just a few minor fails with major wins throughout.

http://combex.com/


Mark Miller is not great at maintaining web pages. He's more of an academic paper kind of guy.

BTW that page is 15 years old.


The general idea is very interesting, but the drawback I see is that this architecture makes it impossible for apps to do work that accesses multiple grains.

Search would be the most obvious example. This was solved pragmatically by implementing it in the framework and not in the apps, but that approach doesn't seem to scale for me. What if certain types of grains require application-specific indexing? What if there are other tasks that cross grain boundaries but only make sense for a specific app?

Additionally, this limitiation makes it critical to get the definition of what is a grain right from the very start, when you design your app - once you realized you got the granularity wrong, I figure it would be very hard to split or merge existing grains to change it.

If I remember correctly, the Sandstorm documentation itself had examples for a word processor and for a photo editor app. However, while a grain for the word processor represents a single document, a grain for the photo editor is a photo gallery. So choosing granularity is not always trivial.


Yes, there are certainly some patterns that become challenging under the grain model (and some patterns that become easier).

Note that nothing is impossible. You can always connect grains to each other using the powerbox (when it's ready, which will be very soon). Of course, if nearly every grain of some app needs to talk to all the others, that will get tedious. So the next thing you can do is fall back to course-grained apps, or write an app that creates its own grains internally and therefore can talk to all of them if it needs to. In this case, you're giving up a lot of the advantages of the granular model, but you gotta do what you gotta do. What you end up with is no worse than the status quo, at least.

With that said, in practice we have found that aside from a small set of common features -- e.g. search and backup -- these kinds of problems really don't come up much. Most kinds of inter-grain communications do in fact fit nicely into the powerbox model. By adding platform features to cover the things that don't fit, we can cover, say, 90% of use cases without compromising the model, and that's a pretty big win.

And frankly, these features usually make far more sense as platform features than as app features anyway. A search index that covers all your apps is a lot more useful than having to search each app separately. A backup system that backs up all your apps is one you'll be much more likely to actually configure. Etc.

Also note that these kinds of systems that need access to "everything" are a security liability, and so probably not the kind of thing that you want every app implementing in their own special way. By moving them into the platform, we can make sure they are designed with restrictions that make them secure. For example, the backup system should only get access to encrypted copies of data. The search index should be prohibited from exfiltrating data in any way except as search results displayed to the user. Etc.

Anyway, the point is, yes, there are challenges, but with a pragmatic approach, they can be minimized, and the gains far outweigh the losses.


Do you see sandstorm as ever hosting medium-to-large scale, externally-facing websites (as opposed to personal or intranet-type sites)?


Some day, yes. However, for now we are focused on the use case where the infrastructure reports to the user rather than the developer. Developers ship packages, users choose where to run them (whether on their own machines or a cloud host). We feel we have a lot more value to provide in this use case than we would have in the SaaS infrastructure market.

Also, if we can make it just as easy to use apps on a user-controlled server as it is to use SaaS, then SaaS no longer makes sense -- it's biggest selling point is that it's easy. I believe a shift back towards more decentralized infrastructure would be a very good thing, so that's what we're aiming to create.


Thank you for your reply.

> I believe a shift back towards more decentralized infrastructure would be a very good thing, so that's what we're aiming to create.

I definitely agree. Thank you for your leadership in this direction.


I'm not deep into sandstorm or know how search is/will be implemented there but these thoughts might help understanding alternatives.

It seems you see the search indexer as a process which has accesses all data from the outside in order to index it.

What if each grain itself exposes its inner index via an interface that can be accessed by the global search indexer?

This way the search indexer only sees what it shall see but not data meant to be hidden (like passwords). Also by exposing it via a security managed interface not every process/grain gets access to every other grains data.

This method surely complicates things a bit compared to file system access but the minimum implementation of the grain presenting the search the whole internal file tree. Is no worse than the traditional way.


While having each grain implement a search API would be an elegant approach, it probably isn't practical. With the fine-grained model, we expect users to have hundreds, maybe thousands of grains, and possibly access to (and therefore searchability over) thousands more. Just replicating the query would probably be too expensive. Also keep in mind that a grain normally isn't running unless it is actively being used, and so such a search query would need to start up thousands of grains all at once.

On the other hand, having a single central search index is a pretty big security liability. A bug in search would potentially give you read access to the contents of all documents on the server. Moreover, fine-grained encryption breaks down: since everyone can use the search index, it would need to be encrypted using a key that everyone has access to.

Instead, each user should have a separate search index covering their own private data. This search index itself should run as a grain, to get the same protections / encryption as other grains. The search index grain should not be allowed to hold capabilities to any other grain. Communications between the search index and the grains being indexed should be one-way. The only two-way communication is the search query API, which should usually be accessible only to the user.

As an (important) optimization, grains that are public or accessible to a large number of users should be indexed in shared indexes, not per-user.


We were thinking about two/three different things what the grain does:

1) My idea was that the grain exposes selected data to a search indexer.

2) Your idea was that the grain performs the search itself.

I see why 2) would not work very well in the current implementation. But 1) should work well and make it possible to only expose selected data. (Also with central/user specific search.)

Both ideas could by combined via per-grain-search-grains which only hold the local index and are accessed via a central search interface.

The problem, however, remains, that searchable data is exposed data. To me it seems useful that some applications can have fine-grained control about what to expose. (Think of List of Establishments where each has a List of Customers with Credit Card information. You might make one or two things searchable but not everything.)


I love Sandstorm, but IMO, the requirement of a wildcard certificate is a small drawback in setting it up on my server. I know I can use sandcats.io but if I am using something like Sandstorm, I want complete control over my data, including domains. (I am now using sandcats though so there's that but I wish I could get a wildcard cert for free or from Let's Encrypt :)


I've talked to the Let's Encrypt people on several occasions about this. I think they will support wildcards eventually. The details are surprisingly hairy, though. In the meantime, we'll keep providing free certs under Sandcats.

I too wish we didn't have the wildcard requirement. Unfortunately, same-origin policy being what it is, there's really no way for us to get away from the wildcard requirement without losing most of our security gains.

You've probably seen this already but for others wondering about the details:

https://docs.sandstorm.io/en/latest/administering/wildcard/

And a sample of security problems that our security model (of which the wildcard is an essential part, since it enables fine-grained isolation) has helped protect against:

https://docs.sandstorm.io/en/latest/using/security-non-event...

Thanks for using Sandstorm!


Once the rate-limits for LE relax, what about on-demand renewal of a SAN certificate?


Sorry, that won't work. Sandstorm needs a new hostname every time you open a document (that's a lost of hostnames), and to provide any CSRF mitigation it needs to be a secret (where anything you list on the certificate immediately becomes public knowledge).

Be sure to read the FAQ in the doc:

https://docs.sandstorm.io/en/latest/administering/wildcard/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: