Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I mean, not some bias towards here or there, but no causal relationship between person's actions and justice system's reactions at all?

It depends on what you mean by causal. Does criminal behavior cause interactions with the justice system? Yes. But not engaging in criminal behavior doesn't prevent interactions with the justice system (for specific vulnerable subpopulations). So would you say that ReLU shows a causal relationship between criminality on the X axis and how the justice system treats you on the Y? I don't think I would.

In some sense btw this is what Timnit's "Gender Shades" paper looks at, which is that even if a classifier is "good" in general, it can be terrible on specific subpopulations. Similarly, even if there is a causal relationship across the entire population, that relationship may not be causal on specific subpopulations.

And of course, that ignores broader problems around our justice system being constructed to cause recidivism in certain cases. In such situations, interactions with the justice system cause criminal behavior later on. So clearly, in general since Y is causal on X, X can't be causal on Y.

> But if this is the case, then the whole discussion is pointless.

No! Because people trust computers more than they trust people. Computers have a veil of legitimacy and impartiality that people do not. (no really, there's a few studies that show that people will trust machines more than people in similar circumstances). Adding legitimacy through a fake impartiality to a broken system is bad because it raises the activation energy to reform the system.

At it's core, that's probably the biggest issue that Yann is missing. Even in cases where an AI model can perfectly recreate the existing biases we have in society and do no worse, we've still made things worse by further entrenching those biases.

> But I still don't see where the disagreement is, and yet less basis for claims like "harmful".

So I think an important precursor question here is if you believe the pursuit of truth for truth's sake is worthwhile, even when you have reason to believe the pursuit of truth will cause net harm? Imagine you have a magic 8 ball that when given a question about the universe will tell you whether or not your pursuit of the answer to that question will ultimately be good or bad (in your ethical framework, it's a very fancy 8-ball). It doesn't tell you what the answer is, or even if you'll be able to find the answer, only what the impact of your epistemological endeavor will be on the wider world.

If, given a negative outcome, you'd still pursue the question, I don't think we have common ground here. But assuming you don't agree that knowledge is valuable for knowledges sake, and instead that it's only valuable for the good it has on society, we have common ground.

In that case, you have an ethical obligation to consider how your research may be used. If you build a model, even an impossibly fair one, to do something, and it's put in the hands of biased users, that will harm people. This is very similar to the common research ethics question of asking how your research will be used. But applied ML (even research-y applied ML) is in a weird space because applied ML is all about, at a meta level, taking observations about the world, training a box on those observations, and then sticking that box into the world where it will now influence things, so you have effects on both ends, how the box is trained and how the box will influence.

Like, in many contexts "representative" or "fair" is contextual. Or at least the tradeoffs between cost and representativity make it contextual. Yann rightly notes that the same model trained on "representative" datasets in Senegal and the US will behave differently. So how do you define "representative"? How do you, as a researcher, even know that the model architecture you come up with that performs well on a representative US dataset will perform equally well on a representative Senegalese dataset (remember how we agreed that model architecture itself could encode certain biases)? Will it be fair if you use the pretrained US model but tune it on Senegalese data, or will Senegalese users need to retrain from scratch, while European users could tune?

Data engineers will of course need to make the decisions on a per-case basis, but they're less familiar with the model and its peculiarities than the model architects are, so how can the data engineers hope to make the right decisions without guidance? This is where "Model Cards for Model Reporting" comes in. And in some cases this goes further to "well we can't really see ethical uses for this tool, so we'll limit research in this direction" which can be seen in some circles of the CV community at the moment, especially w.r.t. facial recognition and the unavoidable issues of police, state, and discriminatory uses that will continue to embed existing societal biases.

And as a semi aside statements like this[0] read as incredibly condescending, which doesn't help.

[0]: https://twitter.com/ylecun/status/1275162732166361088



> It depends on what you mean by causal.

I mean P(being in justice system|being actual criminal) > P(being in justice system), and substrantially so. Moreover, P(being criminal|being in justice system) > P(being criminal). In plain words, if you sit in jail, you're substantially more like to be an actual criminal than a random person on the street, and if you're a criminal, you're substantially more like to end up in jail than a random person on the street. That's what I see as causal relationship. Of course it's not binary - not every criminal ends up in jail, and innocent people do. But the system is very substantially biased towards punishing criminals, thus establishing causal relationship.

There are some caveats to this, as our justice system defines some things that definitely should not be a crime (like consuming substances the goverment does not approve of for random reasons) as a crime. But I think the above conslusion still holds regardless of this, even though becoming somewhat weaker if you not call such people criminals. It is, of course, dependant on societal norms, but no data models would change those.

> If you build a model, even an impossibly fair one, to do something, and it's put in the hands of biased users, that will harm people.

That is certainly possible. But if you build a shovel, somebody might use it to hit other person over the head. You can't prevent misuse of any technology. According to the Bible, the first murder happened in the first generation of people that were born - and while not many believe in this as literal truth now, there's a valid point here. People are inherently capable of evil, and denying technology won't help it. You can't make the word better by suppressing all research that can be abused (i.e. all research at all). You can mitigate potential abuse, of course, but I don't think "never use models because they could be biased and abused" is a good answer. "Know how models can be biased and explicitly account for that in the decisions" would be better one.

> his[0] read as incredibly condescending, which doesn't help.

Didn't read condescending to me. Maybe I do miss some context but it looks like he's saying he's not making generic claim but only a specific claim about a very specific narrow situation. Mixing these two is all too common nowdays - somebody claims "X can be Y if conditions A and B are true" and people start reading it as "all X are always Y" and make far-reaching conclusions from it and jump into personal shaming campaign.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: