I'm a bit confused about how this article fits in with the previously posted article about the company (http://news.ycombinator.com/item?id=1817631). There, they are harvesting PII data (e-mail) and re-selling access to related information. In this article, they explain a machine-learning-type algorithm to anonymize data to contain no PII. I don't really see how the two fit together. What's their deal?
Personally Identifying Information (PII) is defined as data which can uniquely identify an individual--name, social security number, facebook ID. Rapleaf's personalization service--which is designed to let websites personalize based on who is viewing the website--does not serve PII. Instead, it serves targeting data like "Age 40-50" and "Interests Basketball".
The idea behind the Anonymouse project is that a person should not be able to be personally identified based on the targeting information served about them. Only sets of targeting data which cannot be traced back to a unique individual is stored in the personalization cookie.
For example, it would be okay to serve a targeting cookie about a person which contained "Male, Wealthy, looking to buy a Ferrari", because thousands of people fit that description, and the ad network or website cannot identify the person with any reasonable specificity.
It would not be okay, however, to target a person as "52 year old Male, makes $256,000, lives in Sometown OH, and was born on April 12th", because in all likelihood, only one or two people fit that description, and the data would effectively serve as PII, and this would be no better than dropping a facebook ID in the cookie.
Let me know if anything's unclear, or if you want more details. From a CS perspective, it's a really cool/hard problem, and something we've spent a lot of time on. We're planning on writing an update blog post on where we are... as soon as we're done coding : )