Differential privacy and other formalized systems are a good choice, but if you ...

russnewcomer · on April 30, 2018

But if I do need the original data back, say, the driver needs to produce an expense report with the hours, what would you do in that case? I have thoughts, but trying to bounce off of someone else.

entee · on April 30, 2018

If you need to provide the data back to the customer, then maybe the right answer is to follow the same standards as financial institutions and health companies do. In practice, that comes down to ensuring that no individual has access to the underlying data without extreme monitoring of how that data moves around and is used. This is a rather large burden though, so I can understand if that's too much for your use case.

Things we do:

  - Rotate passwords used to access networks/servers regularly
  - 2FA all the things
  - Only provide permissions to what a user needs
  - Limit it to just time a user needs it 
  - Logging+security scanning across the backend infrastructure
  - Tight monitoring of devices used to access network for patch level
  - Keep front-end networking infrastructure redundant and patched
  - Multiple levels of auth (vpn pw, vpn 2FA, then public/private key for each server, then 2FA for each server, etc.)

You can only do so much but you can make it so that it's harder to compromise the crown jewels.

russnewcomer · on May 1, 2018

That makes sense. The data set is going to be in the health area, and I'm less concerned about processes for the individuals in the organization having access (like what you've suggested) and more thinking about how to structure the data so we as an organization can't access it. Dealing with infectious disease, where there is personal benefit to not letting someone outside the care side know that you have a disease, but societal benefit to tracking trends, outbreaks, or hygiene around the disease. And figuring out how to structure the system so that if we were to sell, say, there wouldn't be this trove of information on who has what diseases, just who was a customer.

Thanks for your thoughts!

ecnahc515 · on April 30, 2018

Store the delta's + the identifying information somewhere else as a lookup table and use a random ID to join to it. Keep the PII database secured, offline, or whatever makes you feel best, and then if anyone needs direct correlation back to the end user, it is done through a different process that ensures higher access controls/auditing, etc.