Hacker News new | past | comments | ask | show | jobs | submit login
Environment Variables Considered Harmful for Your Secrets (movingfast.io)
193 points by pwim on Jan 2, 2015 | hide | past | favorite | 109 comments



This article is misguided. Environment variables can be more secure than files. Furthermore, in the presented case there's no improvement in security by switching to a file.

To address my second claim first: file permissions work at the user or group level. ACLs / MAC likewise. SELinux can be configured to assist in this case but it's not as trivial as it appears at first glance, it would be easier to use environment variables.

In the example case of spawning imagemagik, it's running as the same user and therefore has the same level of access to the properties file. That is, it can access the secrets without negotiating any authorisation to do so.

Depending on how imagemagik is launched and how the parent process handles the config file, it's possible that imagemagik could inherit a file handle, already open to the file.

Now to address my first claim, if the parent process is following best practice then it will sanitise the environment before exec'ing imagemagik, that should mean launching the imagemagik process with only the environment it needs.

To give a concrete example, the postfix mail transfer agent is extraordinarily high quality software, its spawn process owns the responsibility of launching external processes, potentially sysadmin supplied / external to postfix. This case would be very comparable to the web app invoking imagemagik.

We can see that it explicitly handles this case as I've suggested is best practice: https://github.com/vdukhovni/postfix/blob/master/postfix/src...

EDIT: accidentally posted before finishing.

If the parent sanitised the child's environment, then the only way for the child to access the data would be to read the parents memory. In practice this is quite easy - try the "ps auxe" command for a sample, however this access can much more easily be controlled by SELinux policy than can file access.

Any obfuscation technique applicable to a config file can similarly be applied to an environment variable.


This article is not talking about _malicious_ applications running on your system, but unintentionally-security-weakening ones. Think of the bash RCE bug from a few months back: the problem was not that Bash was malicious, but a mismatch in expectations about how to trust the environment variables. Similarly, lots of bits of code not written with the glimmer of a notion that the environment is sensitive may end up accidentally leaking them.

This is an extension of the "don't put secrets in the query string" advice for HTTP: it's not that it's more secure: anything along the way that reads the query string can also read the post data just as easily, but that more bits of software may inadvertently leak the query string but not the post data (in logs, for instance).


Thanks. I'm mainly looking at this from the point of how your secrets could be accidentally exposed.

I applaud to postfix for sanitising the ENV, and it's very good practice to do so. But are all the frameworks doing it correctly? Maybe some code is then also just spawning new processes without sanitising? You could argue that's a bug then (which I completely agree), but not all projects are run like postfix...


The basic system call, execve, requires that you specify the environment explicitly. The whole idea of an inherited environment is a construct of shells and historic libc functions, and the libc functions that do not have an explicit environment should be deprecated. The man page examples show constructing the environment from scratch, which is best practise: you should never refer to the whole environment, just access individual items from it, ie never reference environ, just getenv etc.


I doubt many projects take the same disciplined (akin to micro-services) approach that postfix does:

     ... mail delivery is done by daemon processes that have no parental 
    relationship with user processes. This eliminates a large variety of 
    potential security exploits with environment variables, signal handlers, 
    and with other process attributes that UNIX passes on from parent to child.

 [edit] sorry for the spam, there was some problem with the submission form.


You are programming. You can do anything. But a guiding principle should be not to surprise other people. Environment contain information about the environment, search paths and the like. They don't contain secrets. That would surprise people.

Sure you can sanitize the environment, and make sure to wipe out sensitive data after reading it, but if that step fails for some reason, the only way you will notice is when you have a data leak. That's not a way to fail safe. You may the one to never make mistakes, but your coworkers aren't that perfect. That can only lead to security problems in the end.

When given the choice to store your secrets in regular config files or make up something with environment variables, choose the former. Doing it the expected way will buy you lots of goodness later on: You can use ACLs or policies to restrict access, you can allow just a limited number of reads, or pretty much anything else the VFS allows you to.

When you are not alone in your programming, do not inflict onto others what you wouldn't want inflicted on yourself. Avoid surprises.


How do you set env vars with out a file?


My company has an internal bit of infrastructure that I think is a somewhat novel approach that allows us to never have any secrets stored unencrypted on disk. There's a server (a set of servers, actually, for redundancy) called the secret server, and its only job is to run a daemon that owns all the secrets. When an app on another server is started up, it must be done from a shell (we use cap) which has an SSH agent forwarded to it. In order for the app to get its database passwords and various other secrets, it makes a request to the secret server (over a TLS-encrypted socket), which checks your SSH identity against an ACL (different identities can have access to different secrets) and does a signing challenge to verify the identity, and if all passes muster, it hands the secrets back. The app process keeps the secrets in memory and your cap shell disconnects, leaving the app unable to fetch any more secrets on your behalf.

The other kink is that the secret server itself reads the secrets from a symmetrically-encrypted file and when it boots, it doesn't actually know how to decrypt it. There's a master key for this that's stored GPG encrypted so that a small number of people can retrieve it and use a client tool that sends the secret server an "unlock" command containing the master key. So any time a secret server reboots, someone with access needs to gpg --decrypt mastersecret | secret_server_unlock_command someserver

There are some obvious drawbacks to this whole system (constraining pushes to require an SSH agent connection is a biggie and wouldn't fly some places, and agent forwarding is not without its security implications) and some obvious problems it doesn't solve (secrets are obviously still in RAM), but on the whole it works very well for distributing secrets to a large number of apps, and we have written tools that have basically completely eliminated any individual's need to ever actually lay eyes on a secret (e.g. if you want to run any tool in the mysql family, there's a tool that fetches the secret for you and spawns the tool you want with MYSQL_PWD temporarily set in the env, so you need not copy/paste it or be tempted to stick it in a .my.cnf).


This reminds me of OpenStack Barbican (Previously called CloudKeep.. kinda..) initially built by Rackspace. A good intro video at [1].

One of the interesting (and optional) things is does, is provide a agent to run on your instances that require the secrets, the agent implements a FUSE filesystem, and access to this filesytem is controlled by policy. For example - A policy can say "Allow exactly 1 read of /secrets/AWS.json within 120 seconds of boot". Any out of policy access attempts can cause the instance to be blacklisted, preventing any future secret access etc..

[1]: https://www.openstack.org/summit/portland-2013/session-video...


This looks really great. I watched the video and the rationale and tradeoffs they discussed sounded exactly like conversations we had back when building our system. The FUSE filesystem and agent panics are features that I wish I'd thought of.


The system sounds very well thought-out, though probably not applicable at my $work location.

> When an app on another server is started up, it must be done from a shell

That's a no-go for many setups. It doesn't integrate well with how Linux distros usually start services (systemd, upstart, sysv init, ...), and means you have to have another way to manage dependencies between your services.

> When an app on another server is started up, it must be done from a shell (we use cap) which has an SSH agent forwarded to it. In order for the app to get its database passwords and various other secrets, it makes a request to the secret server (over a TLS-encrypted socket), which checks your SSH identity against an ACL

At this point you could have used ssh right away, no? Any reason you used TLS + checking SSH agent instead?


  > That's a no-go for many setups. It doesn't integrate well
  > with how Linux distros usually start services (systemd,
  > upstart, sysv init, ...)
Change the daemon config file to use a small wrapper script, which initializes the SSH environment and then execs the target binary. Assuming a reasonable setup, this should be trivial.

  > At this point you could have used ssh right away, no?
  > Any reason you used TLS + checking SSH agent instead?
It sounds like they take an SSH identity certificate from the agent, send it via TLS, and then the remote process verifies it. This would have fewer potential security issues than trying to lock down a user's SSH login shell.


> Change the daemon config file to use a small wrapper script, which initializes the SSH environment and then execs the target binary. Assuming a reasonable setup, this should be trivial.

Well, the point is that the ssh needs to have forwarded agent from somewhere else. If the host on which the service is run can initiate it, the whole security aspect is gone.

> This would have fewer potential security issues than trying to lock down a user's SSH login shell.

Locking down a login shell (usually be not running a shell in the first place) is a solved problem, and for example gitolite uses it has the base of its architecture. Yes, you have to be careful, but you must also be careful when manually validating certificates.


> At this point you could have used ssh right away, no? Any reason you used TLS + checking SSH agent instead?

Yeah, using the SSH login method is actually quite slow for something you want to call at app startup on N instances during a push (at a minimum, your process responsible for whatever gatekeeping you do has to be respawned for every request, which necessarily puts a lower bound on the latency). I'm sure this could have been tracked down and optimized, but as jmillikin points out, another downside is that the additional per-user config can get kind of messy and error prone. Implementing logic like this at the .ssh/config level is (in my opinion) kind of easy to goof up and hard to test.


If anyone's interested in a somewhat out-of-the-box version of what's described above, using a Consul server/cluster to hold this information should give you basically everything ntucker listed. It's pretty trivial to setup and configuring it to store its data on an encrypted partition is also pretty simple. It's got ACLs and can support TLS connections as well. It's also got a bunch of features that the above system doesn't have, like being distributed (redundancy isn't the same thing as consensus) and datacenter-aware (I'd prefer to have different secrets per-datacenter, when possible).

We've been using it to store our application secrets for some time and had no complaints.


This sounds like a pretty standard bastion server configuration. The use of SSH is novel, usually I see the bastion address provided as a command-line option and a TLS certificate used to authenticate the client.


There's a not-widely-publicized feature of Linux that allows programs to store secrets directly in the kernel: https://www.kernel.org/doc/Documentation/security/keys.txt That has some advantages, including the guarantee that it can't be swapped to disk. Kerberos can use it for secret storage, I haven't seen it used elsewhere though.

It looks like process-private storage is one of its features.


I haven't looked to hard at the docs yet, but this seems kind of awesome. Is it something that you have to build your own kernel for, or is it configurable in a prebuilt kernel?


Stock ubuntu here has a /proc/keys file, so I think it's generally available.


The main drawback of this ( and it's a minor drawback in the larger scheme of things ) is that this isn't a portable interface and isn't available on other POSIX-ish OS's. It looks like there are key agents available for Freebsd, OS X and Illumos though.

Thanks for pointing this out.


It seems like the security for that keyring is based on uid, which can cause problems in the world of containers.

https://news.ycombinator.com/item?id=8321210


Classic UNIX behavior was that environment variables were public (any user could see them with the right flags to "ps") so it was well-known not to put anything secret there.

Most (all?) of the current brand of UNIX variants have locked this down quite a while ago, which is a good thing. There are still a few old boxes kicking around though so if you're writing code that is meant to be widely deployed please don't put stuff there. For example: https://github.com/keithw/mosh/issues/156

Even if you are sure that your code will only be running on modern machines I think this article gives good advise. Unless you purge the secret environment variables when you launch they'll get sent to all of your subprocesses and it's quite possible that one of them won't consider their environment secret.


Classic UNIX behavior was also to have plaintext passwords in /etc/passwd, the shadow file was invented much later...

But why should we jump through hoops for something that was broken over 15 years ago?

Do you still filter ping packets on your router because back in the '90s large pings would crash[1] many operating systems?

[1] http://en.wikipedia.org/wiki/Ping_of_death


What, you don't run AIX 0.9?


This is the best argument for not doing this imo.

"Classic UNIX behavior was that environment variables were public (any user could see them with the right flags to "ps") so it was well-known not to put anything secret there."

The same logic applies to Windows environments and modern *nix too...

Storing private data in a public location is obviously a bad idea.


Ok, but that's classical behaviour. Modern *nix behaviour is treating the environment as private data.

I don't get the reason for all that disagreement. Both methos have clear practival advantages and problems, and both are nearly as secure as the other. Use whatever fits better your problem.


On Linux, at least, only root can view a process' environment via ps(1). So it's at least as secure as the filesystem from an inspection point of view.


It doesn't work that way anymore, though. Only root can see other processes' environments via ps(1) now.


It's not public. Only a given user or root can see the envvar. Same as a file.


I've always been uncomfortable with the "store config in the environment" part of the 12 factor app thing, since it does imply storing things like database passwords and such, and the argument is that those shouldn't be in files. But, filesystem permissions are reasonably flexible and are easy to reason about (unlike the potential visibility of ENV).

I also don't really buy the arguments for ENV storage of even non-sensitive data. There's just not really any good reason to do so; your config has to be put in place by some tool, even if it is in ENV; why not make your tool produce a file, with reasonable permissions in a well-defined default location? The 12 Factor App article seems to believe that config files live in your source tree, and are thus easily accidentally checked into revision control. That's not where my config files live. My config files live in /etc; or, if I want it to isolate a service to a user, I make a /home/user/etc directory.

One could say, "Don't store passwords in the revision control system alongside your source." And that would be reasonable. But, there's no reason to throw the baby out with the bath water.


> filesystem permissions are reasonably flexible

Env makes things really flexible.For instance,you can call a program with env variables directly thus overriding the default ones,which simplify configuring applications. You dont have to have a test set-up,a production-setup or a staging set-up,just start a server or an app with different env variables in the command line.

You don't want your config or keys to depend on an OS,or a language. Finally Env variables can be restricted to a set of users,so third party process started with a different one cant access them.

I believe env variables are better than other solutions.


Maybe I'm old, but I remember way too many ENV exposure bugs over the years to be particularly confident dumping sensitive information into the ENV. Admittedly, all software should be sanitizing the ENV for untrusted users, but there's just such a long history of people making mistakes. I'm one of those people, in fact. Sure, frameworks and more modern execution models make it easier to avoid those mistakes today, but I've never accidentally checked in a config file with passwords (except a couple times in my t/ test directory, and that only for test server instances). I've made real data exposing security mistakes with ENV.


>just start a server or an app with different env variables in the command line.

Just reference a different config file on the command line.

>You don't want your config or keys to depend on an OS,or a language.

A path to a config file isn't an OS or language. The ini format is pretty widely supported.

>Finally Env variables can be restricted to a set of users,so third party process started with a different one cant access them.

This is exactly what filesystem permissions do for files already.


    BEGIN {
        $API = new Backend($ENV{credentials});
        delete $ENV{credentials};
    };
Filesystem permissions do not make it possible for a program to internally partition access to those credentials unless you (a) start it as root, or (b) delete the credentials file after reading it.


A user process can start subprocesses under different users, without access permissions to the files owned by parent. The most obvious way to do so is to use sudo.


Using a setuid helper (even if it's called sudo) is still starting a program as root.


Not necessarily; sudo on Fedora has CAP_SETUID instead of the setuid bit, so it doesn't actually run as root.

http://fedoraproject.org/wiki/Features/RemoveSETUID


But CAP_SETUID, unless I'm confusing it with something else, can be used to set the UID to 0 and thereby gain all the same privileges as if the program had been started as root, can't it? Presumably it has some advantage that I'm not getting – does it have to be combined with e.g. SELinux to be useful?


>> just start a server or an app with different env variables in the command line. > Just reference a different config file on the command line.

Either of that seems to be able to be read straight from argv[0] of the process, so don't forget about cleaning that up.


Spring Boot allows all of this at once [1]. You can specify a configuration file in your sources but if you set specific keys in the environment that configuration has a higher priority. I think this is a very good solution.

Particularly this allows you to package a general configuration file, a user can provide an external configuration file and can for a specific instance even overwrite that with env (or command line arguments).

[1] http://docs.spring.io/spring-boot/docs/current/reference/htm...


I agree that ENV variables are useful for general configuration, that's exactly what they were invented for...

ENV variables are not restricted by user though, your process can spawn another process under a different user and give it the same environment. It's the nature of the environment that it is usually inherited from the parent which causes the issues when we're talking about secrets.


Delete sensitive environment variables after you read them, or don't run programs you don't trust with an unsanitary environment/argument list (e.g. execve not system)


Thanks for that idea of deleting sensitive environment variables. I like that for hosters such as heroku which use ENV variables for config (including secrets) by default.


Config is ultra-sensitive information. It's like not having the right .gitignore if you're backing up /etc in git or not having enough password log filters.

The issue is that ultimately, one has to trust some infrastructure as being of an ultimately trusted network (whether it's Puppet/Chef/cfe2+ or offline authoritative CA's)


Installing secrets on disk exposes them to potential leakage through backups. This is a major issue, since much less attention is typically paid to access management for backups than to production servers. Therefore I support the approach of providing secrets through the environment.

Once an application has been written to get its secrets from the environment, there is a question of how the secrets are obtained. They can be sourced from a file in an init script, but today we are seeing a lot of momentum towards containerized architecture, and the use of service discovery and configuration systems like etcd, zookeeper and consul.

However, secrets require much more attention to security concerns than the data that these tools are designed to handle. Least privilege, separation of duties, encryption, and comprehensive audit are all necessary when dealing with secrets material. To this end, we have written a dedicated product which provides management and access for secrets and other infrastructure resources (e.g. SSH, LDAP, HTTP web services). The deployment model is similar to the HA systems provided by etcd, consul, SaltStack, etc. It's called Conjur (conjur.net).


Looks interesting. However, isn't the conjur API key stored in .netrc just another secret that can be easily leaked?

On the distribution as a virtual appliance: is it a black box? How do I back it up? How do I upgrade it? How do I check the integrity of the secrets database?


Regarding the API key, from an operational perspective it's an improvement. The reason is it creates a separation of duties between management of the API key (responsibility of opsec), and the application credentials (responsibility of the broader ops team and in some cases the developers themselves). There is no way for the application credentials to be accidentally leaked or mis-deployed, because the policies created by opsec are always enforced.

In addition, the security team can make decisions about the management of the API key, such as:

* It can be kept off physical media (e.g.in /dev/shm)

* The process(es) by which the API keys are created and placed on the machines can be carefully managed

* A dedicated reaper/deprovisioner process can be used to retire (de-authorize) API keys once they go out of service.

Regarding the virtual appliance:

* Is it a black box? Essentially yes, although there are specific maintenance scripts on the machine (e.g. backup / restore) that you may occasionally run. For normal operation, only ports 443 and 636 (LDAPS) are open.

* How do I back it up? A standby master and read-only followers can be used to keep live copies of the database. A backup script can be used to capture full GPG-encrypted backups. A restore script will restore a backup onto a new appliance.

* How do I upgrade it? You create a standby master and followers of the new upgraded version of the appliance, and connect them to your current master. Once the standby master and followers are current with the data stream, they are promoted via DNS and serve the subsequent requests.

* How do I check the integrity of the secrets database? We have written an open-source Ruby interface to OpenSSL called Slosilo (https://github.com/conjurinc/slosilo) and subjected this library to a professional crypto audit. On the advice of the auditors, Slosilo encryption employs message authentication (cipher mode AES-256-GCM) which ensures the integrity of the secrets. Nevertheless, the Conjur appliances must obviously be afforded the highest level of protection. Unlike a multi-master system, Conjur's read-only "followers" contain only read-only copies of the secrets. So compromise of a follower cannot be used to modify secrets or authorization policies, or to confuse a master election scheme.


Thank you for the reply.


This is indeed an issue, though with a modern cloud based infrastructure and management systems there is no longer a need to create backups from your production servers. They can be automatically recreated and no important data is stored on them.


Yes, thanks for this observation. Although inevitably, some "pets" remain...


Fundamentally, any secrets you store will have some mode of access - there's a downside to each and every way of distributing them.

If you're shelling out to commands you think might snarf credentials, the environment is easy for them to pick it out of, but if they're running as the same user then they could probably read the secrets from the config file. If they aren't running as the same user, you need a way of passing in the secrets - and we tend to come back to environment variables..

The good practice here is just to reset the environment when calling shell commands, as he notes. It's not hard to do.


You could use a kernel service that checksums the calling DLL or even all code loaded in the calling process and compares that against a list of trusted callers before doling out the secret.

Disadvantage, of course, is that you will have to update the list of trusted callers whenever they change. Mac OS X automates that by automatically trusting binaries signed with the same key (trades some security for convenience)

Another disadvantage is that this doesn't work well with scripts (the kernel service would have to know how to find the running scripts in order to checksum them, for every possible scripting language on the system). Also, any form of extension support in a trusted application is problematic.


Wow, this is the only thing I've seen in this thread which doesn't just add another layer of turtles to the problem. With your suggestion, even if an attacker could gain access to the box, they wouldn't be able to get at the secrets. Is there any prior art/blog posts/software for this approach you could point me at?


For the Mac OS X technology, https://developer.apple.com/library/mac/documentation/Securi... is a good starting point.

I don't know of full open source equivalents. Parts of Apple's code are open source, though, for example http://opensource.apple.com/source/security_systemkeychain/s... (may not even compile; not all Apple's open source releases do)


You're right that a tool which runs under the same user could read your config file and thus could access to your secrets.

But there is one main difference: that tool would need to do so explicitly, with the intent of reading (and possibly exposing) your secrets. For me, that's a huge difference from having the secrets being implicitly available to the process through the processes environment.


It doesn't need to do so explicitly: it just needs to have a bug that can encourage it to do so. Most environment leakage also needs to be triggered too.


Ultimately, secrets need to live somewhere and need to be accessed as plain text. Just make sure that the access as small window is as possible, and try to obliterate it after use, if possible.

If one absolutely needs to centralize secrets (TLS/SSL private keys, entropy sources, etc.) (at risk of SPOF or some HA setup), use some PSK style setup that delivers them directly, out-of-band (via separate NICs) or prioritized ahead of regular traffic. Keep it simple. Otherwise, prefer something like zookeeper with encrypted secrets (again PSK keying per box). Try to not deploy the same secret on every box, if possible. Also, try to avoid magic secrets if you can too (remove all local password hashes, use only auth keys).

If you're uncomfortable with plaintext secrets, encrypt them (as end-to-end as possible) and require an out-of-band decryption key at the last possible moment.

It's like having a secure document viewing system... ultimately, someone will need to browse just enough of the plaintext version, or it's not a document viewing system.


Isn't this what TPM was designed to avoid ?

Neither files nor env variables.

Most chipsets have a rather unused TPM function, and it should be possible to have developers and processes hook into that.

Perhaps using tmptool ? On master process startup ask user for passphrase, and use that to query the TPM stored values ?

http://manpages.courier-mta.org/htmlman1/tpmtool.1.html


I have to admit that I don't know a thing about TPM. Like is it also available in virtual environments like AWS is providing? How could this be automated? You don't want to enter a passphrase every time a server (re)boots. Would love to hear if anybody successfully used that.


  Cloud servers
  =============
Xen supports virtual TPM.

I'm no Amazon EC2 expert, but a quick google exposed a few keen souls who tried to use vTPM and failed. This would suggest that Amazon does not yet support vTPM.

  Re-entering passphrases
  ========================
Well, unless the machine is permissioned by default you will need to give a fresh instance new authorization. Permissioning by default is the same security problem you're trying to avoid though... just shifted. Your overall goal is to have the credentials inaccessible to sniffing, right ?

I guess you could set up some form of ssh-agent handshake to make the process less manual.


XenServer (The product from Citrix) or Xen 4.3+ support vTPM. Not sure which version of Xen that Amazon uses, but if/when they upgrade to 4.3 it should have built-in support for vTPM operations.


sniffing isn't the main issue I'm trying to avoid, it's accidental exposure. I.e. minimising the risk that during normal operations the secrets get exposed somehow.


Okay... sniffed accidentally then ( putting them in the wrong directory, not using fs permissions properly etc )

I would say that you should consider malicious sniffing too


Please don't use environment variables to store secrets. There are to many angles - as stated by others - where this data may leak into files or processes.

I would propose to use just one folder like /secret and put your config files in there. Exclude this folder from backup on all relevant hosts.

Then spend your time on security of your hosts, applications (OWASP) and monitoring / alerting. Something that you have to do anyway.


Assuming you arn't trying to go for top security, and just want a way to keep things safe from leaking due to errors and the such

why not just make use of the OS's secrets store? for example, like how https://pypi.python.org/pypi/keyring operates


I would love to know more about this, but AFAIK OS secret stores are mostly a desktop-concept that requires a logged-in user (whose login password is used to decrypt the secret store). On a deployment server, you don't have those.


It should be easy to set up such a key store on server setup or startup. The downside is that, as always, if anyone has access to the user, they can decrypt the keys.


we are in a similar situation and there is another approach I'd like to research. In order to have a distributed properties I was considering using something like consul[1] or etcd[2] which have some control access and load the required variables from upstart scripts

[1] https://www.consul.io/

[2] https://github.com/coreos/etcd


You could still store your secret keys in ENV but encrypt them. Only your program has the method to decrypt them so in the case of an ImageMagick sub process it would access only your encrypted secret key with no knowledge to decrypt it.

Same thing while debugging : only the encrypted key is printed.


You would need the key to decrypt somewhere also not in your code.


Why not in the code ? As I see it we're not trying to fend off Mr Über attacker, just avoid that your keys become public by mistake.

And instead of a secret key which is easily searchable, your method could just do some substitutions, something a bit more complicated than a Caesar cypher. Yes it's really weak but it beats an unencrypted secret key.

I know security minded people are not gonna like it, but until we have a real battle tested solution it's better than nothing.

A determined attacker will almost always win against our best defenses. I think we have to do our best to make their job hard, but at one point we have to accept that offense is really easier than defense.


Ansible has a neat feature, called Ansible Vault, which lets you encrypt sensitive files. This in combination with dotenv-deployment works pretty well for our Rails Apps. The only thing I’m worried about is someone gaining unauthorised access to our serves and thus being able to read all the credentials stored inside the .env file especially the username & password to our externally hosted db. Probably the only way to prevent this would be “to properly secure your server” and the use of an IDS? Anyone has any experience with someone hacking their servers and successfully preventing e.g. a db dump? In this particular case, how easy would it be to stop attackers in their tracks?


If someone gets sufficient access to your server to read arbitrary files it is 99.95% possible that they also have sufficient access to just read your DB username and password straight out of the Rails application's memory. (One method, among many, would be "Attach a debugger to it." For a graphic example of what is possible with debuggers, in a format slightly easier for Rails devs to understand, see: https://github.com/ileitch/hijack)


I typically store the env _name_ in the environment, and then use that in my apps to build a path to the file containing secrets (e.g. /etc/{mycompany}/{environment}/myapp.conf). The file is locked down by ACLs or permissions.


If I set _name_ to "../../blackhat/myhackedconfig/" will that cause problems?


as long as you don't store your config then in the same repository as your code, that works fine for me.


There's a (not widely publicized) feature of Linux that enables secure key storage inside the kernel: https://www.kernel.org/doc/Documentation/security/keys.txt Storing keys in the kernel has some advantages -- your key will never get inadvertently swapped to disk etc.

It's been too long since I used it to remember the details, but I believe process-private keys are one of this API's features.


I got bit by env vars a few years back, but due to performance issues in getenv() and ended up writing a whole bunch of PHP magic to ship config files safely and fast.

With pecl/hidef, I can hook a bunch of text .ini files into the PHP engine and defines constants as each requests comes in.

Originally, it was written for a nearly static website, which was spending a ridiculous amount of time defining constants, which rolled over every 30 minutes.

Plus those .ini files were only readable to root, which reads it and then forks off the unprivileged processes.

But with the hidef.per_request_ini, I could hook that into the Apache2 vhost settings, so that the exact same code would operate with different constants across vhosts without changing any code between them.

Used two different RPMs to push those two bits so that I could push code & critical config changes as a two-step process.

And with a bit of yum trickery (the yum-multiverse plugin), I could install two versions of code and one version of config, and other madness like that with the help of update-alternatives.

That served as an awesome A/B testing setup, to innoculate a new release with a fraction of users hitting a different vhost for the micro-service.

I'm rambling now, but the whole point is that you need per-request/per-user config overlay layers, for which env vars are horrible, slow and possibly printed everytime someone throws some diagnostics up.


Please consider the environment before printing this config?


We had to migrate our software from single tenant (per machine) to multiple tenant for our cloud offering, on a 11 year old code base.

We used Michael's trick: environment variables pointing to config files works unbelievably well if you ever need to implement a multiple tenant cloud offering.

So apart from the security aspect, there's the aspect that it is a more versatile design.


We've had good success with distributing our secrets using a GPG-encrypted file that we put in /etc, not in the source code tree. We then use an ENV setting to point the app to the file. This gives us good flexibility (because one server can have multiple GPG files if we want, such as alpha/beta/gamma) and good encryption.


How do you manage the decryption key?


Clearing your environment variables after reading them, and only passing the ENVs required to perform the new task are pretty basic security measures. This was pretty common practice in the 90's, and then I was hoping that would be one of the lessons out of ShellShock.


Ok I'm confused by 'environment variable' vs files. How does one set an environment variable without putting it in a file on the particular server. Or by 'file' in this article (and the 12 factor one) do they mean a file that in source control?


The article means 'file in source control' - the specific context is that the author is one of the co-founders of Heroku where there is a whole separate (really nice) system for handling 'config variables' as part of your app deployments separate from source control.


You can also run foreman if you're not on Heroku. Put your environment variables in a `.env` file. The environment variables get sourced only to the environment of that process and not to the __whole__ system


Ah I see..thanks!


1. It's easy to grab the whole environment and print it out (can be useful for debugging) or send it as part of an error report for instance.

If you have software in your deployment that will send "error reports" to untrusted third parties then you have bigger problems than your shell environment.

2. The whole environment is passed down to child processes

If you don't trust your child processes then you have bigger problems than your shell environment.

3. External developers are not necessarily aware that your environment contains secret keys.

And?

I'm not sure what you mean by "external developer" and what you expect them to do with your environment. E-Mail it out when an error occurs?

If you tolerate that kind of developer on your project then you.. oh well, see above.


Is it so inconceivable that one might trust error reports to third parties, but not secret keys?


Either you exercise control over what you send, or you don't.

This has nothing to do with your choice of configuration method.


why not just split it up with an OTP, that you store in the code (or a file), then the other half in the environmental variable. combine in code (or include the file). seems like that would work. (you need both parts.)

I think this article is a response to people's practice of keeping API keys as an environmental variable so as to keep them off of the filesystem (or at least what git sees and checks in) so that they don't accidentally publish them, as happened in that article article where some gem he was using to respect .gitignore didn't work for some reason.

would this work as a solution?


ENV propagation to unwanted targets is a legit point.

Our most common crash report scenario is the airbrake gem sending crash reports from our rails app to errbit. I can confirm airbrake gem does not post any sensible env data.

Of course, this is only a good news for that specific case, others apps may transfer environment, and we can't just wonder for each app installed "what will it ever send ?".


I wrote a library to handle mixed configuration values by using asymmetric RSA encryption [1].

[1]: https://github.com/jacobgreenleaf/greybox


There are lots of problems with this, not the least of which being:

• Configuration files can be trivially forged (rsa without signing)

• Many configuration files can be trivially decrypted without the key (rsa use on potentially big files)

• Keys can be trivially recovered in multiple ways (global variable, ptrace)

Users might be tricked into using your library despite the fact it offers them no real security except false security.

I recommend placing a warning that it is not intended to be used by anyone at any time.


Your criticisms don't seem to make much sense...

* Preventing forgery/tampering is an orthogonal issue and can be done by intrusion-detection software.

* "trivially decrypted without the key" -- How? The size of the file is not going to make any difference here.

* Of course, the key can be recovered if you have the ability to ptrace. What's your point? That's going to be true of any solution out there.

Why don't you make some constructive criticism instead of being obnoxious.


Writing software that may run in a potentially hostile environment should be done with great care, and a thorough analysis of what you are actually protecting against.

This is a job for people who take programming seriously, not amateurs who can't be bothered to read the documentation of the modules they are using, e.g.

http://stuvel.eu/files/python-rsa-doc/usage.html

Note especially the bits about signing RSA, and how to use it to encrypt files. Doing raw RSA on strings longer than 245 bytes is unwise, and inventing new protocols for RSA is unwise. That the author does these things is strong evidence that the author is not demonstrating "great care".

If the author believes forgery/tampering is orthogonal and/or protection from in-process attacks, then the author should state that. That's part of the analysis step.

> Of course, the key can be recovered if you have the ability to ptrace. What's your point? That's going to be true of any solution out there.

Nonsense. If you delete the key from memory, it is obvious that someone cannot use ptrace to recover it afterwards. Detailing what is at risk, and what is not at risk is part of the analysis step.


> Writing software that may run in a potentially hostile environment should be done with great care, and a thorough analysis of what you are actually protecting against.

The first section of his documentation states what the purpose of the module is and what's it is trying to protect against. The use of RSA there is pretty reasonable, so I don't know why you're being so hard on him. It also talks about how this project is a variation on another project, so it's not like he is going off half-cocked and implementing something crazy.

> This is a job for people who take programming seriously, not amateurs

Everyone starts out as an amateur. You can either be an internet know-it-all that wants to show off how little you know or you can provide constructive feedback.

> who can't be bothered to read the documentation of the modules they are using, e.g. http://stuvel.eu/files/python-rsa-doc/usage.html > Note especially the bits about signing RSA, and how to use it to encrypt files. Doing raw RSA on strings longer than 245 bytes is unwise, and inventing new protocols for RSA is unwise. That the author does these things is strong evidence that the author is not demonstrating "great care".

Did you bother to read the documentation yourself? It directly talks about encrypting large files. Yes, it's true that trying to encrypt more data than is supported for a key size will leak information, which is why implementations throw an error if you try to do that. The usage documentation you pointed to discusses that and provides options to cope with it. But, in this case, he is not even encrypting the whole file, just the values of certain properties, which are probably going to be small enough to not require any extra measures.

And, again, what does signing have to do with how RSA is being used in greybox?

> If the author believes forgery/tampering is orthogonal and/or protection from in-process attacks, then the author should state that. That's part of the analysis step.

They did state the scope of the project and there is really no need for them to go into deep detail about deployment. Are you going to criticize him for not pointing out that the whole system needs to be secured from tampering? Of course not, so stop acting like a douchebag.

>> Of course, the key can be recovered if you have the ability to ptrace. What's your point? That's going to be true of any solution out there. > Nonsense. If you delete the key from memory, it is obvious that someone cannot use ptrace to recover it afterwards. Detailing what is at risk, and what is not at risk is part of the analysis step.

You have to ask, where did the key come from in the first place? The private key file has to exist somewhere that is accessible on the box so that the process can read it in the first place. If someone has ptrace, they can probably read a file as well.

Ultimately, it's about the level of risk people are willing to live with. For what greybox seems targeted at, it's encrypting properties in files that are committed to a repo and then using a private key that is only available on certain machines to read that property. For some use cases, that is probably a perfectly fine level of security and better than what is being done in some cases anyways.


> You have to ask, where did the key come from in the first place?

It is sensible to type it in while the system is still in single-user mode.

If you left the private key in a file that is right next to your encrypted configuration file, then it remains as I stated previously: No security except false security.

> Everyone starts out as an amateur.

This is not the forum for an education on programming. A library that states it was written as a learning exercise with a request for criticism will be treated that way. A library proposed to solve problems recognised in the linked article will not.

> Ultimately, it's about the level of risk people are willing to live with.

People are notoriously bad at recognising and evaluating risk.


> If you left the private key in a file that is right next to your encrypted configuration file, then it remains as I stated previously: No security except false security.

Shut up and read the goal of the project already. It is trying to protect secrets that are stored in a repo and it achieves that goal.

Secrets on a deployed host will always be in the clear on that host, it's just a fact of life. Many barriers can be put in place, but at the end of the day, a program will always need access to the plaintext version of the secret at some point.

> It is sensible to type it in while the system is still in single-user mode.

No it isn't. Services have to restart all the time, saying that a human has to be ready to type in a password at any moment is not practical.

> This is not the forum for an education on programming.

But it is a forum for you to post invalid criticisms of a project and act like a dick? That's bullshit. This is "hacker news", it's a perfectly fine place for a technical discussion. If you thought there were problems with the project and this wasn't the right place to discuss it, then file issues on the github project.

> A library that states it was written as a learning exercise with a request for criticism will be treated that way. A library proposed to solve problems recognised in the linked article will not.

There is no difference, you're just trying to justify your bad behavior.


How do you suppose I do this in AWS where I have several auto-balanced servers? It kind of forces you to put it in environment variables.


Why is that? Isn't that exactly the same problem whether you're using environment variables or a config file?


Because all the instances spin up with a fresh file system, but you can specify the environment variables in the AWS console.


while i agree that storing api keys in the code repository is not the best idea, i am curious about the suggestion of moving it into chef configs.

wouldnt that, in turn, also be stored in a code repository, likely accessible in the same way as the main coe repo? then, this feels like a non-solution to me.


Its a trap thinking you have to store all the config info in Chef(recipes, data bags, etc). Its easy to call out to other services for config info to render templates with.

Mreinsch's s3 reco is a good example. I use this method for storing extra role secrets for AWS.

It's good to keep in mind the words Morpheus when dealing with Chef(and all this stuff really). Free your mind.


Yes you're right. Operations teams then find that they have to lock down their Chef / Puppet master much more tightly than before. In so doing, they make these systems harder for developers to work with, and they introduce additional work and overhead for themselves (servicing secrets-related tickets).


The permissions on our chef repository are different. We can give access to the main code repository without giving access to the chef repository.

Alternatively, if you're running on AWS you could also fetch the secrets config file from an S3 bucket which is only accessible by your production servers.


While I could simply tell you to blank out ENV vars once you've internalized them, I will instead write an infinitely long essay on how they are "considered harmful" that contributes absolutely nothing back to society.


This is not always an option; the argument applies as well to using tools written by others that expect secrets to be provided in an environment variable. The AWS SDK is a popular example. While it will also happily read credentials from a configuration file, or allow them to be explicitly passed, it's a popular method and is the only one that is consistent across the various SDKs that AWS provides.


I created a quick gem (which you shouldn't install) that demonstrates having some untrusted code in your app which will post all of your environmental variables to a 3rd party server: https://github.com/tibbon/env_danger

Now, of course no one would install and run this... but I could imagine someone accidentally typing the name of a Gem wrong, someone accepting a bad PR (a sub-dependancy perhaps even doing so?), etc and somehow something untrusted getting in there. Yes, that means you have other problems, but it isn't outside the realm of possibility that accidental access like this is had.

Just because it shouldn't happen, doesn't mean it will never happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: