Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Securely store sensitive data in the DB?
77 points by provetza on July 28, 2013 | hide | past | favorite | 48 comments
I am designing an information system and one of the requirements is to keep sensitive data encrypted in the database, with a possible intruder being unable to decrypt them. Encrypting everything in the application with a key and then storing to the database is unacceptable, since all it does is add a little difficulty for an intruder -once he gets the key he gets the data too.

Passwords are kept hashed, so the password provided in the login gets hashed and if it matches the stored hashed password the user is authenticated, otherwise not. The password is not stored in cleartext and cannot be retrieved, but of course can be reset if the user forgets it. So far so good but what happens with other sensitive data that I need store, as API keys, cc data etc? These cannot be encrypted with the user password, because if the user forgets the password these become useless.

What are some best practises to keep sensitive data encrypted on the database, and reassuring that after a system break-in the attackers won't be able to get the data unencrypted? I want to design and implement a solution as secure as it can be and would like to hear thoughts, ideas and experience by other startups and engineers. I have not found anything really useful in this direction, apart from references to proprietary solutions that promise to do anything on some magical way (no comments!)




Any system which allows you to locally decrypt information, for the purpose of doing anything for the user, should be assumed to allow an attacker who roots the box to locally decrypt information. That's the unfortunate harsh technical reality.

If you have compliance reasons motivating this need for encryption, you'll find that e.g. HIPAA and PCI-DSS ignore technical reality, in favor of requiring that you encrypt information stored at rest and imposing substantial penalties on you if it leaks. There are a variety of ways to do this. One fairly common one for HIPAA-compliant applications is putting the e.g. MySQL data files on a partition which is block-level encrypted. You then issue decryption keys to folks who need them, such as e.g. the application.

If your host is totally compromised, the host holds both the decryption key and the ciphertext, which means "Sucks to be you." However, this does provide non-zero increase in security (e.g. if an old copy of the DB drive ends up floating off to eBay because of poor physical control on your part, and you can document that it doesn't include the encryption key, you just avoided a reportable information breach), and it does check the appropriate boxes on e.g. HIPAA.


To expand on this a bit, absolute security for encryption just doesn't exist. If you wanted your data 100% secure, put it in a database, disconnect the DB from your network, put it in a locked room, guarded by biometric locks and security guards. Even in that scenario, the data is vulnerable, but why even bother discussing that point, as the data is worthless if you can't access it.

With that reality in mind, I was responsible for PCI for a large part of the infrastructure at a Level 1 Merchant, meaning a yearly audit had to be passed. Ultimately, our solutions boiled down to restricting access to an external (read different machine/network segment), firewalled host that did the decryption. In some cases this was an appliance that was purchased (this helps with compliance, but they're expensive, and they're a nightmare if they become a performance bottleneck as they're a black box you know little about). In other cases we used a web service we built that worked similarly (auditors will pick this apart because it isn't a "standard" solution).

In all cases here is a high level of how they work: encrypted data is passed to the service, which pulls the encryption key out of memory, decrypts the data, and sends it back to the requesting host. The encryption key is stored in (at least) two pieces, each piece is encrypted with a key encrypting key, key encrypting keys are know to very few employees, no single employee holds both key encrypting keys. The encryption keys is only assembled in its entirety while in memory.

Again, there are problems to this, as patio11 intimates, compliance includes much theater a times, but this is reality, and it does provide benefit over other solution, in this case, at least three layers of security must be compromised before you could decrypt everything.


I'm in ecommerce as well and I've seen PCI/DSS auditors require vendors/hosts to rearchitect using an encryption/key management appliance. You wrote you built your own solution - are there no known, trusted open source alternatives? As you mention, the appliances are almost astronomically priced, so it seems like an area OSS (or a disruptive startup) would help.


Love to follow up on that too


An HSM will hep.

Also, no mention was made of architecture. If this is a web app then putting a network and application firewall between the web application and database will allow you to observe traffic and then block or refer requests that attempt to access volumes or types of data that don't match typical usage patterns. Even better would be a dedicated DB API secured as required, and not allow the web application direct access to the database. At an absolute minimum, grant the web application rights to access only required stored procedures, and deny direct access to tables and views.

If there's a client (desktop, phone, tablet or another server) then keys should be generated there, limiting the impact of disclosure should the DB be lifted.

[Edit] Forgot to add that it's always worth creating a threat model. Draw a box around each process. Enumerate the inputs and outputs that cross process boundaries. Consider how each might be attacked, and how each might be secured. There are 6 types of threat to worry about - spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege.


HSM = hardware security module, for those that aren't familiar.

http://en.wikipedia.org/wiki/Hardware_security_module


1. If at all possible, don't store credit card numbers in your database. A payment gateway will take care of this for you - you have an iframe the user uses to submit their credit card details straight to the payment gateway, and the payment gateway gives you back a token you can use to charge and refund at your convenience (locked to your merchant account so not useful to attackers). DataCash and Chase Paymentech are two companies that provide this service, and I'm sure there are others too.

2. If the user forgets their password and resets it, ask them to re-enter their credit card details in case their e-mail has been hacked. (Also ask them to re-enter their details for deliveries to new postal addresses, if applicable)

So if you can't access CC data after a customer resets their password, that's no problem.

3. Use database-level security; set up roles and accounts in your database so tables containing sensitive data only have select grants to apps and users that really need them. When a table has some columns that are sensitive and others that aren't, set up a view with the sensitive columns replaced with placeholder data and give them access to that instead.


How do payment gateways store CC information? For example if I am a payment gateway, how do I securely store CC data?


YOu have a dedicated box that stores details and is remotely contacted through an XML-RPC/JSON-HTTP API of some sort.

The API should have two methods:

* Add a new card to account. * Make payment of £xx from card NN.

The machine is locked down, runs no other services, and so cards cannot be exported/stolen from this system. You'd encrypt the filesystem and prompt for a key/passphrase at boot. Ideally you'd only login via the serial console so the only service exposed is your "add/charge" methods.

(Even allowing the remote-deletion of cards could be a security issue; obviously.)


Exposing only the "Add new card" and "Charge Amount XX" methods actually makes sense, Thanks for the info!


You're very clever, young man, very clever, but it's payment gateways all the way down.


Payment gateways have it somewhat easy - the system making the "make a payment" request doesn't ever need the actual sensitive data returned. That data can be stored on a heavily secured server that only can call out to the banks.


With an expensive insurance plan.


It's not really a risk you can really protect against in any meaningful way with insurance. It's the banks that ultimately eat the loss, and they just write this off as a cost of doing business. Banks really do lose a lot of money to fraud, card losses, and other miscellaneous security issues and bad actors -- a lot more than people think.

Storing CC data can be done sufficiently securely, of course, although PCI DSS guidelines are insufficient and silly in places, and you really have to go above and beyond to make it secure enough because data breaches can threaten the status of your merchant account. If you become too costly or too risky to do business with, the bank will simply cut you off by terminating your merchant account. You may also end up on the TMF which will make it very hard to get another merchant facility in the future.


There's a book series "Translucent Databases" with a lot of interesting use cases, where the assumption is that an adversary has gained access to the entire database.

I've only read the first edition, and it's some years ago, but I'd recommend giving it a quick read. :)


As many commenters have pointed out, there are theoretical problems with what you're asking for (if you can decrypt the data, then your attacker can too). But, there are some practical things you can do to make the attacker's life harder:

- Don't write your own cryptographic code or design your own crypto systems; use existing libraries as much as possible.

- Separate your reads from your writes. Using a public/private key pair, you can give one set of systems the ability to write encrypted data but not read it, and a different set of systems the ability to read the data. The systems that can decrypt / read data should be isolated as much as possible - don't expose them to public networks, limit which operators have access to them, etc. The separation also forces you to encapsulate your secure data and define an API over it; rather than arbitrary reads, hosts that don't have the decryption keys will have to ask the hosts that do to perform specific operations. If you're writing a Rails app, the Strongbox gem [1] enforces this pattern for you.

- Rotate your encryption keys.

- Don't store keys in code. Follow the Heroku pattern [2] of storing any sensitive data (i.e. private keys) in the environment, where it is bound at runtime to your code and encrypted data.

- Store as little sensitive data as possible. Make sure data you don't need any more is periodically purged.

- Human processes are just as important as code; keep track of who has access to sensitive data, make that access opt-in, and remove it when those people change jobs or are terminated. Do everything in your power to keep those operators from being phished (user education, two-factor auth, etc).

- Don't store credit cards if at all possible. Find a payment processor [3] to do it for you. It's not worth the headache, it makes you a more attractive target, and it may come with additional legal overhead (depending on your jurisdiction).

[1] https://github.com/spikex/strongbox

[2] http://12factor.net/config

[3] https://stripe.com/


I'd say the first thing to understand here is that absolute safety is impossible in this case. If the hosting server is compromised, password loggers can be installed and even the login page itself can be altered to remove any form of security. With access to emails, an attacker could send an official email asking everyone to reset their passwords, etc.

So your question is actually: How can I make my system divulge the least amount of data as possible over time to someone who has compromised the service?

To hamper someone from changing your service to remove security you could set up daily checks from a server hosted in a different location to download your static resources and check them against a pre-validated hash.

For storing data - as others have mentioned - the key is to store that data in a way that it cannot be accessed from that one server alone. A simple solution for this is to setup an internal service that will provide the data when given the correct login details. This gives the attacker an additional server he would need to hack. If you keep this layer as simple as possible it can add a lot of security. Of coarse, if the hacker is able to compromise your server for a long period, he can record anything passing through here anyway.

In the end though, the web-server itself is a lynchpin in which all customer data has to flow at some point, and if that key server is compromised for a long enough period, eventually all data can be extracted regardless of precautions. That means that designing your web service with security in mind from day 1 is very important. Regardless of what people try to sell you here, there are no silver bullets that will prevent data theft - only mitigate the impact or delay it.


This gives the attacker an additional server he would need to hack.

If they root your web tier, and your web tier knows how to ask your internal service layer for sensitive data, then the attacker knows how to ask your internal service layer for sensitive data.

I really hate repeating "If you lose any one box in your deployment then you can assume you will lose all data, regardless of whether you encrypt things or not" because it makes me feel like Debbie Downer, but that is, in fact, the threat environment.


If you don't store your user passwords/hashes in a way that your web layer can access directly, then you can slow the attacker down by requiring them to wait for that user to actually log in and send the password. ie. If your web layer passes the authentication tokens through to the data layer, and the data layer handles storage/authentication of those tokens, then hacking the web layer only allows you to log future requests.

To achieve a layout such as this, you would prevent your web layer from talking to the database itself directly, and force all data requests through a different service layer.

Obviously, this makes your whole architecture much more complicated and you only really gain any security if you are able to detect the attacker before he can sniff all passing user data anyway.

Your assumption is still spot on though - one box down really does mean game over. All of the tactics above and in the rest of this thread only slow down an attacker or make the attack more complicated. None of them will ever prevent it entirely.


While this is true in theory, compartmentalizing services and data can add security in practice. Attackers may not know the details your internal systems, and any unusual behaviour helps with detecting intrusions.

* For example, if credit card details are stored on a separate service, then the web tier can be given only a "charge the card" API, instead of a full read access.

* Even if the decryption key is held in memory, encrypting the data on disk helps against accidental leaking of backups, physical theft of the servers. Or a hacker with limited skills who copies a database dump, is discovered from the unusual network load, and does not have time to fully investigate the system and extract the key from memory.

* If sensitive data is held on a separate service, but the web tier has read access to it - then the other service can impose rate limiting and unusual activity detection to block attempts to dump everything quickly.

* If the data is encrypted with the user's password, which is not stored in plaintext at all - then the attacker can at best only access accounts that log in before detection.

All of these still give an attacker full access, assuming they have infinite amount of time and skills. But for many practical scenarios, they can reduce the amount of compromised data.


Wouldn't that be a simple issue of limiting what the web layer can ask for? I mean sure it would allow the attacker to charge a creditcard - but hopefully only to an approved account, and he wouldn't ever see the actual details. Your web layer never needs them, so why should it have the right to ask for them?


- Keep passwords in memory (so as if you start the service it prompts for the password)

- Asymmetrical crypto. So for example, you encrypt your CC data upon sign-up but then to run the charges you need the private key (and this is somewhere else)

- Enable SSL communication with your DB. Postgres has this, because being defeated by network sniffing is bad.


Memory is not secure. It's quite a common attack to grab keys / passwords from the memory of an executing program.


No, it is one of the safest places.

If your attacker has access to arbitrary memory of a process, you're using an insecure OS/version. Or they dumped your process memory using a vulnerability (in your system)

Yes, there are some possible attacks (page file, cache, etc)

Attacking memory after a reboot requires physical access (unless you hibernated without an encrypted file, in this case...)

It certainly beats the security of file/network


Just because it's more secure than network / file doesn't mean it's secure. That's why we have smartcards / HSMs.


Smartcards are vulnerable as well and there has been successful attacks towards some smartcards (google it)

And not every system has an HSM available


Sure but smartcards and HSMs are slow and in a properly managed environment are much safer than memory.

Whether or not the OP has an HSM is moot. The OP said, "I want to design and implement a solution as secure as it can be" ... and that means (among many other things) keys on HSM.


In the first case, what if we're running a service in a cluster that starts instances automatically?


If you have to store CC data inhouse I would suggest storing it on a completly sepatate machine which only stores and charges cards. The only communication allowed from this box would then be Store this card, Charge the card with this token etc.


I wrote a similar comment too. In practice you find you might want to allow "delete card" or "update card" which are complications to the simple-model.


I'm a fan of using client-side encryption so that the database only ever stores encrypted content, and therefore can be treated as out-of-scope for PCI compliance purposes.

Take a look at https://github.com/braintree/braintree.js which is a nice library for encrypting data with a public key before being uploaded.

This is a specific exception to the generally correct concept that Javascript cryptography is bad and should be avoided. http://www.matasano.com/articles/javascript-cryptography/ Of course, it's essential that the whole transaction take place over SSL.

And even then, you still need to have a set of machines that can read from the database and access the private key, and those machines must be highly secured, as well as supporting requirements like key revocation and key rotation.


If statistics are considered sensitive, you can use cryptographic counters in lieu of their plaintext counterparts.

A cryptographic counter is a public string representing an encryption of a quantity, satisfying the following properties:

1. Subjects with access to the public-key can update the encrypted counter by an arbitrary amount, by means of increment or decrement operations and without first decrypting the value (i.e., the operation is performed over encrypted data);

2. The plaintext value is hidden from all participants except the entity holding some secret key;

3. The adversary can only learn if the cryptographic counter was updated (i.e., information about whether the counter was incremented or decremented is kept hidden to all participants except the secret-key holder and the updating entity -- honest-but-curious threat model).

An implementation is available at https://github.com/secYOUre/Encounter .


This is an ideal use case for homomorphic encryption, whenever it becomes useable.

https://en.wikipedia.org/wiki/Homomorphic_encryption


use Asymmetric encryption and store the private key in an HSM. The private key never leaves the HSM device.


This, and other layers of security. There is no magic formula to keep your data safe.

If you are serious you need network security (i.e. firewalls physically separating networks), proxies, IDS etc. You also need to build your app in a security conscious way (read owasp.org).

You simply can't do this sort of thing well if you do not have infosec experience. If this data is truly sensitive you should hire someone who lives and breathes security. There are established methods and approaches. You can't get away with "use this algorithm" or "use that library".

After you've built, pen tested and deployed your app; security depends on key management and good change management practices that you simply can not skimp on.

We once built an app that uses an HSM; even though that app is in a secure and private (single occupant) data centre in the organisations own basement, they decided it was necessary to get a "shark cage" built, just so that they could tell if the server had been physically compromised.


You can't ever really tell if a server has been physically compromised. IDS/HSM/bla are only a chance at working out if it's happened. A perfect attacker could obtain access to any system and never trigger any alarms if they understand the triggers for any alarms that are in place.

Much the same as you can never tell if someone has broken into your apartment: you could tell a novice has broken in by looking for papers that are out of place or footprints/fingerprints. An expert burglar would make sure not to leave anything obvious like that. You could tell if an expert has broken in using something like IDS: set up a special trap or webcam that will detect it.

However, a perfect burglar would replace the webcam tapes, find and disable/ren-enable any traps, etc. Since most web hosting environments are so standard, it's actually a MUCH easier prospect to be a perfect hacker than a perfect burglar too.

Also, no amount of perfect security skills can keep you absolutely safe. An unknown exploit in your OS is simply out of scope for even the greatest security expert, and no amount of best practices can help if your OS/CPU/RAM/Network Card will give the intruder full access through some unknown flaw outside of your control.


We keep encryption keys for sensitive data in active directory and have a front end firewall, web servers, midplane application firewall, back end service layer cluster, internal firewall before anyone front facing can get at the info. The decrypted data is never passed to the web layer.

To gain access, someone will have to root two separate active directory domains after breaking into multiple low privilege accounts and a database cluster.

Possible always, but we make it a hard target.


After all, the primary objective isn't to create an impenetrable system, but one that's exceptionally difficult to penetrate.


The best practice for storing sensitive data: don't. You have to design your system presuming that someone has hacked your server - if the encryption key is stored there, they can find it.

- As you're aware, password hashes are a way to avoid storing sensitive data (though they should still be treated as sensitive). You're using a strong hash function and a salt, right?

- Re-use the principle of password hashes for API keys to simply avoid having to store hyper-sensitive data: generate a long (say 512-bit) secure random number (using OpenSSL) as the user's secret API key. Then hash the key as if it were a password and store only the hash. Now if someone steals your API key database they can't use it to authenticate as your users.

Note: for API keys a strong hash such as bcrypt will probably be too slow and resource-intensive. However, because API keys are (long) random data, unlike passwords, you can use a faster hash function like SHA-1.

- As for credit card data: don't. You probably can't afford the PCI audits and dedicated hardware and the same principle applies: just don't store sensitive data. Instead, many payment gateways offer 'tokens' for recurring payments in which you pass the payment information to their API without storing it (or use their hosted page in iframe, if acceptable to you) and they return a token which can be used to charge against that card in the future. Not all payment gateways offer this, and some charge (too much) so take a look at https://spreedly.com/ which offers a middle-man gateway service which adds tokens and other API feature to pretty much all of the payment gateways.

As you can see, in both cases it's possible to simply avoid storing the most sensitive data.


Here's an idea:

Use the user's password to decrypt a key, that then decrypts the data - which I know you can't do because of password resets...

So to deal with password resets, create another password which decrypts the same key. Store that other password in a physical safe, possibly in a bank safety deposit box. This will slow down password resets to a manual process of course.

For additional security you can store these split a password in two or more pieces and store in different banks. For convenience you could allow users from the same organisation to reset each other's passwords (since they all have access to the same key).

Also, use a IDS so you know know as soon as you've been hacked - because people logging in at that time are still at risk.


Trivia note: this is, in a nutshell, how the Lotus/IBM Notes ID works. The password is used in a KDF to generate a key, which in turn decrypts the user's private key (and certain other credentials, along with symmetric secret keys for shared encrypted doccuments). Success/failure is determined solely by the successful decryption of known bytes in the encrypted package. Other info (the user's public key, identity and certifier, all signed) are maintained in the clear and can be easily and safely exported and may be "trusted" for authentication with remote machines. There is a "password recovery" system as well (it doesn't actually recover the password, but allows a reset), requiring cooperation of two or more admins¹ (in a Shamir-type arrangement) so that previously-encrypted user data will not be lost.

¹ There is the option to use a single admin, but there are great big warning signs and scary red boxes all over that section of the doco. It's something you'd only use in a solo shop (as a Notes ISV or a Domino web dev).


Let's say you are able to perfectly encrypt something so that only the people who have "authorized access" can get the data.

You still have a problem of determining who these people are. See http://en.wikipedia.org/wiki/Confused_deputy_problem

Your best bet is to have three-factor authentication (something they are, something they have, something they know) generate a key to encrypt the data. Then their user agent still has to be trustworthy (no viruses on their computer, etc.) In addition it has to not be tricked by various exploits (such as https://www.owasp.org/index.php/Session_fixation, https://www.owasp.org/index.php/Session_hijacking_attack, http://en.wikipedia.org/wiki/Cross-site_request_forgery, https://en.wikipedia.org/wiki/Cross-site_scripting). You also have to secure the channel, preferably with TLS using Diffie-Hellman Key Exchange or some other way that won't be compromised just by stealing keys on the server. Even with all this, the Web (HTTP) is not a good way to access the data on the server, because the client usually loads all the code it runs from the server, and thus has to trust the web server not to serve malicious code. Otherwise the server can later do a http://en.wikipedia.org/wiki/Man-in-the-middle_attack such as a http://en.wikipedia.org/wiki/Replay_attack, to get the same data. And when I say the server, I mean the server compromised by some hackers who got root access credentials. And so forth.

In short, you will never have perfect security, just approximations. (Unless possibly if you make use of http://en.wikipedia.org/wiki/Quantum_cryptography)


I was just at Oracle 12c presentation, and I'm not pushing for Oracle (and not associated with them), but just going to mention something they have, that maybe your database provider also has.

Oracle (I believe starting from 12c) has masking for data. They were saying how you can mask everything except last 4 digits of credit cards. So if someone gains credentials of a regular employee, they will be able to query data, but sensitive data will be masked by database itself.


I've recently been looking into the same issue. I need a way to encrypt data before inserting it into a database in such a way that the person inserting the record can read it, their supervisor can read, but their colleague can't. It needs to survive a password reset and I don't want to store any keys on the server unencrypted.

This lead me to attribute based encryption and the libbswabe library. The idea is you generate a master keypair and from these you generate private keys for each of your users. Your user's private keys can only decrypt data that was encrypted with attributes that were also applied to their key.

For example, let's say we have 2 users Alice and Bob. Alice is a supervisor for the IT department. Her key was generated with the attributes "alice" (her username) and "itdepartment". Bob is a normal employee in the IT department, the only attribute applied to his private key was his username "bob"

Now lets say we use the master public key to encrypt each of the fields in the user table (Firstname, Lastname, Email, etc). If each field for a user record is encrypted with the attributes: [current_username] and "itdepartment", then Bob can decrypt his fields because they are tagged with "bob" and "bob" is an attribute in his key and Alice can decrypt her record through the same logic AND every record whose fields were encrypted with the attribute "itdeparment".

If users private keys are encrypted with their password and stored in the database, then the only way you can get Bob's key is to break his password. An attacker now has access to the data that Bob's key can decrypt, but importantly, not everything. If Bob forgets his password (and therefore can't access his private key) then a new one can be generated and all it needs to do is have the "bob" attribute in order for him to have access to all his old data.

Now this is by no means a complete description of a solution, you have to securely store the master private key (you only need this to generate private keys for your users though, not for every put/get request), there's issues around key revocation and lots of gaps in my description, but these issues are present for any crypto system. Attribute based encryption though seems to me like it overcomes a lot of the issues that plague other solutions, the biggest single one being that other solutions require the master private key to either be on disk, or in memory at all times, this solution doesn't need that.


patio11 got this right, if your application is able to decrypt it, then nothing you can will secure this data. Encryption is not the tool that you are looking for.

You can persist with encryption, but only if the user holds the key, ideally via 2-factor auth.

Instead of this, I'd go for whitelist of access, audit logs, monitors, rate limiting and alerts.

If you hold all the encrypted data and the keys, you only need your application server to fail. My personal view is that worse than thinking you have security is not responding (or even noticing) when the inevitable happens.

Configure your systems to be as secure as possible without going down the obscurity path, and then tripwire everything and know what unusual patterns of activity look like and who did what.


What is "CC data"?


Credit card data (numbers and the like, though probably not including CVVs -- the security codes -- because one should not be storing that in any form anywhere).


How are sites that store your CC data for later purchases able to charge your card without storing the CVV? Amazon, for example, and pretty much every subscription/SaaS product.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: