Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> you cannot retrieve the ip from the hash

You can as long as you have IPv4 visitors, because the search space is small enough to brute-force. There are only four billion IP addresses. The user-agent complicates things a little but there aren’t many of those, so you could retrieve the IP addresses of most visitors from the hash if you wanted to.

> residential IPs are usually dynamic

Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way.



> the search space is small enough to brute-force

I get what you're saying - in that if you know the IP address, then you can often easily discover who the individual is. I'd counter that actually, for most people this isn't the case - for many companies, only the ISP, Google, Apple, Facebook etc know who the real user of an IP is... (incidentally, the people most keen too force analytics on you, but that's another issue).

However, that is all kind of moot. The hash itself is PII, because it can be used to track an individual. PII isn't about the difficulty of determining the specific identity of a user, it's about the difficulty in identifying a specific user. The distinction is subtle, but important.

Take an example - people are using a wireless hotspot somewhere, maybe you own a coffee shop, and over the course of a few weeks, you're alerted to the fact that someone has been accessing some illegal content that could get your business in trouble. You've been careful to comply with the GDPR, and your logs only include time and hostname of the server accessed. On it's own, there is no PII there. But, combine that with say credit card transactions, or video footage and finding out who was in the coffee shop every time this happened. Then boom! Suddenly, your time has become PII. Maybe not uniquely correlated to a single person, but a group of people. With every instance of a correlation to that person and a group of random people, it doesn't take maybe to narrow it down to a specific individual.

This is why, to actually comply with GDPR, you need to only store logs for as short a time as is technically required (legally beyond a month is hard to justify, ideally a few days at most) and then you should aggregate into groups where individuals cannot be isolated. If your aggregations result in groups of people that are too small, you need to change the aggregation groups, or report an empty group. It's totally fine to store data like "on this day, n people went from this page to this page, average linger time blah seconds" if n is 10 or more. If n is 1 or close to it, that data is still identifying.


> I get what you're saying - in that if you know the IP address, then you can often easily discover who the individual is.

That’s not what I said. I said if you have the hash, you can derive the IP address from it in most cases.


That part was responding to where you said "Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way."

My point is that whether you can determine the IP address from the hash or not doesn't matter. The hash itself is PII.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: