So one of the issues here is using an externally visible ID (or a transformation...

kevin_nisbet · on April 30, 2018

This is along the lines of where I was going with the alternative approach, I just simplified it for brevity. :)

In the case of teleport, I think this is a bit more difficult to achieve, because we don't necessarily have our own account database, our common commercial use case is integrated to an identity provider through SAML/OIDC, which I'm not sure would consistently offer a random id per account to use.

While there are many way's we could generate and store the username <-> random id mappings, this adds a certain amount of complexity to get right on a distributed system.

If building a system from scratch with end to end control, I do prefer the random identifier approach.

merinowool · on April 30, 2018

Then user emails you to ask what personal data of his you have on the server. Now you don't have a connection so you can't find it, but you have it. GDPR non compliance.

sdenton4 · on April 30, 2018

It's your mapping, so you can easily gather up everything with the given marker and hand it back to them. You only throw away the key (and delete attached data) if the user deletes their account (and maybe after some additional time elapses, in case they change their mind or were hacked); it's the same process as GDPR per-user encryption key deletion.

merinowool · on April 30, 2018

If you throw away the key you still have the data but encrypted. There is no guarantee that in 5 years user data could be easily decrypted.

Thiez · on April 30, 2018

But there's no reason to believe that will be possible either. By that same reasoning it might be possible 'in 5 years' to recover the erased (and overwritten) data from the storage device, so you never can delete anything.

If you use something such as AES 256, which is approved for use to encrypt 'top secret' information by the NSA, and through some miracle it turns out that we can easily decrypt such data in 5 years, then I'm pretty sure you can argue in court that you were following best practices and had no reasonable way of predicting this encryption disaster.

sdenton4 · on April 30, 2018

'Key' here refers to the key in the mapping from external to internal userID. The whole point is that (as mentioned in a sibling comment) choosing an internal user ID uniformly at random is equivalent to a one-time pad; it's guaranteed non-decryptable, unless you invent a time machine...

robbiemitchell · on May 1, 2018

Isn't there a distinction here, though? While they might result in a similar outcome, deletion is different from de-identification.

JetSpiegel · on April 30, 2018

Well, the NSA slurps all Internet traffic, so by that definition, no encrypted communication is possible.

unilynx · on April 30, 2018

If you can't connect it to the user in any way, it's no longer personal information. Expect the data protection agency to compliment you.

JumpCrisscross · on April 30, 2018

> If you can't connect it to the user in any way, it's no longer personal information

Just because you can't connect it doesn't mean nobody else can.

unilynx · on April 30, 2018

http://www.privacy-regulation.eu/en/r26.htm

... account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. ...

It's sufficient if one can't reasonably reconnect the data back to the user. It doesn't need to be NSA-proof.

merinowool · on April 30, 2018

It doesn't say that information cannot be _reasonably_ reconnected, but that you shouldn't be able to reconnect it at all.

I don't know how you have drawn that it shouldn't be NSA-proof from this text if it literally says "in such a manner that the data subject is not or no longer identifiable."

unilynx · on May 1, 2018

Its in the original link, I may have limited the quote too much:

... To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used ...

Nomentatus · on May 1, 2018

Thanks for the quote. Wow. I wonder if this odd definition doesn't render "unidentifiable" to mean "almost certainly identifiable by someone, with a current technique" - since, given enough techniques, most of them will be statistically unusual. I admit it's a start, but mangling semantics that baldly gives me the willies.

The parallel history of cryptography is little more than a history of overconfidence re what counters were thought to be likely, and not. Do we really need to recapitulate that?

GordonS · on April 30, 2018

For all practical purposes, a secure, one-way cryptographic hash is irreversible.

shabble · on May 1, 2018

I'm thinking of a number between 1 and 100.

It's bcrypt hash is: '$2b$15$qUxzZ5ZF55lMuqiH9GMjQOHkNyee86qd2Vh2kQyF5P3U6JZJx9AEC'

I bet nobody could ever reverse this secure cryptographic hash to figure out what it could be... ;)

Nomentatus · on May 1, 2018

I think you need to address converse the examples in the article in order to assert this.

r00fus · on May 1, 2018

Isn't this essentially the uuid() function that many databases support natively (even the black sheep MySQL)?