I know what they trained on because it's been reported on. They got around 50 mi...

intended · on March 31, 2018

Is there a reason that people are only talking about the privacy angle?

People very much don't want these models to exist. They don't want a predictive model which will guess their affiliation just by providing unrelated Activity bread crumbs.

That's why I assumed this whole issue has exploded recently.

Not the privacy, but the implications.

pflanze · on March 31, 2018

But if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?

Edit: is it that the model is then applied to only strictly public data about the person? If so I guess the interesting question then becomes whether the model is definitely not anything near overfitting (i.e. containing enough information to match a person's public data directly since it was trained on it (amongst other data))? (I'm not an ML developer.)

Edit 2: also, going with your comparison with the "20 most representative pixels", it seems interesting then that 'this much' (although not exactly sure how much) information can be inferred from a public profile when just also knowing enough about the whole Facebook population. OK, so perhaps a human would be able to infer about as much, but doesn't scale, and that's why the model becomes valuable?

darawk · on March 31, 2018

> But if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?

I don't know exactly what they were modeling, but from the published reports, it sounds like they were trying to predict big 5 personality characteristics (conscientousness, neuroticism, openness, extraversion, agreeableness) from FB profile data (e.g. likes, dislikes, bio, post content, etc.). So in that case, the model would contain weights that measure the strength of relationship between characteristics like "likes punk rock music" and "openness". That description really only literally applies to a linear model - but nonlinear models are, for these purposes, the same.

anf · on March 31, 2018

> I know what they trained on because it's been reported on.

What reason do you have to think their data set consisted of only what has been reported?

How do you know anything about the models they used?