Hacker News new | past | comments | ask | show | jobs | submit login

So one of the issues here is using an externally visible ID (or a transformation of such) as an internal ID. Why not create a random int64 at account creation time which is invisibly linked to the public username (eg, email address). So now you've got a proper join key, you can restrict access to the map, and it's easy to delete the map entry when the user unsubscribes.

(There can still be good reasons to apply one-way hashing to the random internal UUID, as well: for example, to provide different levels of logs access to different internal users. People who make dashboards get hashed ids, and people who debug logging get raw ids.)

The problem of entropy allowing individual user identification even with all IDs scrubbed is still very real, though, and non-trivial to undertake. One can start by wrapping the query engine with a service which checks that a certain minimum number of people are covered by a given query before returning the results. Or apply differential privacy-type transformations to the output...




This is along the lines of where I was going with the alternative approach, I just simplified it for brevity. :)

In the case of teleport, I think this is a bit more difficult to achieve, because we don't necessarily have our own account database, our common commercial use case is integrated to an identity provider through SAML/OIDC, which I'm not sure would consistently offer a random id per account to use.

While there are many way's we could generate and store the username <-> random id mappings, this adds a certain amount of complexity to get right on a distributed system.

If building a system from scratch with end to end control, I do prefer the random identifier approach.


Then user emails you to ask what personal data of his you have on the server. Now you don't have a connection so you can't find it, but you have it. GDPR non compliance.


It's your mapping, so you can easily gather up everything with the given marker and hand it back to them. You only throw away the key (and delete attached data) if the user deletes their account (and maybe after some additional time elapses, in case they change their mind or were hacked); it's the same process as GDPR per-user encryption key deletion.


If you throw away the key you still have the data but encrypted. There is no guarantee that in 5 years user data could be easily decrypted.


But there's no reason to believe that will be possible either. By that same reasoning it might be possible 'in 5 years' to recover the erased (and overwritten) data from the storage device, so you never can delete anything.

If you use something such as AES 256, which is approved for use to encrypt 'top secret' information by the NSA, and through some miracle it turns out that we can easily decrypt such data in 5 years, then I'm pretty sure you can argue in court that you were following best practices and had no reasonable way of predicting this encryption disaster.


'Key' here refers to the key in the mapping from external to internal userID. The whole point is that (as mentioned in a sibling comment) choosing an internal user ID uniformly at random is equivalent to a one-time pad; it's guaranteed non-decryptable, unless you invent a time machine...


Isn't there a distinction here, though? While they might result in a similar outcome, deletion is different from de-identification.


Well, the NSA slurps all Internet traffic, so by that definition, no encrypted communication is possible.


If you can't connect it to the user in any way, it's no longer personal information. Expect the data protection agency to compliment you.


> If you can't connect it to the user in any way, it's no longer personal information

Just because you can't connect it doesn't mean nobody else can.


http://www.privacy-regulation.eu/en/r26.htm

... account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. ...

It's sufficient if one can't reasonably reconnect the data back to the user. It doesn't need to be NSA-proof.


It doesn't say that information cannot be _reasonably_ reconnected, but that you shouldn't be able to reconnect it at all.

I don't know how you have drawn that it shouldn't be NSA-proof from this text if it literally says "in such a manner that the data subject is not or no longer identifiable."


Its in the original link, I may have limited the quote too much:

... To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used ...


Thanks for the quote. Wow. I wonder if this odd definition doesn't render "unidentifiable" to mean "almost certainly identifiable by someone, with a current technique" - since, given enough techniques, most of them will be statistically unusual. I admit it's a start, but mangling semantics that baldly gives me the willies.

The parallel history of cryptography is little more than a history of overconfidence re what counters were thought to be likely, and not. Do we really need to recapitulate that?


For all practical purposes, a secure, one-way cryptographic hash is irreversible.


I'm thinking of a number between 1 and 100.

It's bcrypt hash is: '$2b$15$qUxzZ5ZF55lMuqiH9GMjQOHkNyee86qd2Vh2kQyF5P3U6JZJx9AEC'

I bet nobody could ever reverse this secure cryptographic hash to figure out what it could be... ;)


I think you need to address converse the examples in the article in order to assert this.


Isn't this essentially the uuid() function that many databases support natively (even the black sheep MySQL)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: