Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A half-baked idea I had while reading the article was to use bloom filters:

User visits the site. On the backend, check if their IP+UA is in the bloom filter or not. If not, increase the unique visitor counter and add them to the filter.

Perhaps the filter would need to be preseeded with dummy data to protect the privacy of the first few visitors.



Not a bad idea, but IP addresses are personal data too.


I think this is a great idea


This is effectively what the “GDPR compliant” providers mentioned in the article are already doing, namely, a one-way hash of the IP+UA. One of the points of the article is that this is non compliant, since you need to transmit the IP+UA to do this calculation to begin with.


But do they store individual IP+UA hashes, or do they mush them together in a bloom filter or a HyperLogLog data structure?

In the first case, it could be argued they still store personally identifiable information (for a limited time, but still). In the second case I think it would be harder to argue the probabilistic data structure with lots of hashes mushed together still constitute personally identifiable information.

> One of the points of the article is that this is non compliant, since you need to transmit the IP+UA to do this calculation to begin with.

IP + UA gets transmitted to the first-party server already. They already have it. The question becomes – is it OK to anonymize this PII we already received for one purpose (serving the web page), to use it for another purpose also (counting unique visitors).


> IP + UA gets transmitted to the first-party server already. They already have it. The question becomes – is it OK to anonymize this PII we already received for one purpose (serving the web page), to use it for another purpose also (counting unique visitors).

Maybe I'm missing your point, but in the situation we're talking about (so-called "GDPR compliant" analytics), if I set up one of these services on my website, the user's IP+UA are transmitted to a 3rd party, for the sole purpose of analytics including counting unique visitors. My understanding is that this is quite different in the eyes of the GDPR from the question you posed, and is almost always not going to be compliant.


I was thinking about the article author's case where they were looking at options for implementing unique user tracking for themselves, on their own server.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: