My company has an internal bit of infrastructure that I think is a somewhat nove...

kiallmacinnes · on Jan 2, 2015

This reminds me of OpenStack Barbican (Previously called CloudKeep.. kinda..) initially built by Rackspace. A good intro video at [1].

One of the interesting (and optional) things is does, is provide a agent to run on your instances that require the secrets, the agent implements a FUSE filesystem, and access to this filesytem is controlled by policy. For example - A policy can say "Allow exactly 1 read of /secrets/AWS.json within 120 seconds of boot". Any out of policy access attempts can cause the instance to be blacklisted, preventing any future secret access etc..

[1]: https://www.openstack.org/summit/portland-2013/session-video...

ntucker · on Jan 2, 2015

This looks really great. I watched the video and the rationale and tradeoffs they discussed sounded exactly like conversations we had back when building our system. The FUSE filesystem and agent panics are features that I wish I'd thought of.

perlgeek · on Jan 2, 2015

The system sounds very well thought-out, though probably not applicable at my $work location.

> When an app on another server is started up, it must be done from a shell

That's a no-go for many setups. It doesn't integrate well with how Linux distros usually start services (systemd, upstart, sysv init, ...), and means you have to have another way to manage dependencies between your services.

> When an app on another server is started up, it must be done from a shell (we use cap) which has an SSH agent forwarded to it. In order for the app to get its database passwords and various other secrets, it makes a request to the secret server (over a TLS-encrypted socket), which checks your SSH identity against an ACL

At this point you could have used ssh right away, no? Any reason you used TLS + checking SSH agent instead?

jmillikin · on Jan 2, 2015

  > That's a no-go for many setups. It doesn't integrate well
  > with how Linux distros usually start services (systemd,
  > upstart, sysv init, ...)

Change the daemon config file to use a small wrapper script, which initializes the SSH environment and then execs the target binary. Assuming a reasonable setup, this should be trivial.

  > At this point you could have used ssh right away, no?
  > Any reason you used TLS + checking SSH agent instead?

It sounds like they take an SSH identity certificate from the agent, send it via TLS, and then the remote process verifies it. This would have fewer potential security issues than trying to lock down a user's SSH login shell.

perlgeek · on Jan 2, 2015

> Change the daemon config file to use a small wrapper script, which initializes the SSH environment and then execs the target binary. Assuming a reasonable setup, this should be trivial.

Well, the point is that the ssh needs to have forwarded agent from somewhere else. If the host on which the service is run can initiate it, the whole security aspect is gone.

> This would have fewer potential security issues than trying to lock down a user's SSH login shell.

Locking down a login shell (usually be not running a shell in the first place) is a solved problem, and for example gitolite uses it has the base of its architecture. Yes, you have to be careful, but you must also be careful when manually validating certificates.

ntucker · on Jan 2, 2015

> At this point you could have used ssh right away, no? Any reason you used TLS + checking SSH agent instead?

Yeah, using the SSH login method is actually quite slow for something you want to call at app startup on N instances during a push (at a minimum, your process responsible for whatever gatekeeping you do has to be respawned for every request, which necessarily puts a lower bound on the latency). I'm sure this could have been tracked down and optimized, but as jmillikin points out, another downside is that the additional per-user config can get kind of messy and error prone. Implementing logic like this at the .ssh/config level is (in my opinion) kind of easy to goof up and hard to test.

curun1r · on Jan 3, 2015

If anyone's interested in a somewhat out-of-the-box version of what's described above, using a Consul server/cluster to hold this information should give you basically everything ntucker listed. It's pretty trivial to setup and configuring it to store its data on an encrypted partition is also pretty simple. It's got ACLs and can support TLS connections as well. It's also got a bunch of features that the above system doesn't have, like being distributed (redundancy isn't the same thing as consensus) and datacenter-aware (I'd prefer to have different secrets per-datacenter, when possible).

We've been using it to store our application secrets for some time and had no complaints.

jmillikin · on Jan 2, 2015

This sounds like a pretty standard bastion server configuration. The use of SSH is novel, usually I see the bastion address provided as a command-line option and a TLS certificate used to authenticate the client.