It's better to move towards a more open and transparent system that's data drive...

AnthonyMouse · on Nov 27, 2017

> When our ML solutions are built on historical data, it learns those biases as well.

It's actually worse than that, because the algorithm is giving you the correct result.

For example, suppose that black men are more likely to fail to appear in part because they have worse, less flexible jobs because of racist hiring practices.

That isn't fair, but it's still true that they're more likely to fail to appear.

Are we supposed to give bail to people we know aren't going to show up because the reason they won't show up isn't fair?

That seems like an obviously bad idea, where the good idea is to do something about racist hiring practices so that doesn't happen anymore. At which point the algorithm will see the new data and un-bias itself.

jnty · on Nov 27, 2017

This is a dangerous misunderstanding.

The issue comes when features correlated with race are used to make inferences. This can occur in complex ways but a simple example, used elsewhere on this thread, is zip code. It might be that people from a mainly black area are more likely to exhibit whatever negative behaviour the model is looking for than a mainly white area, perhaps for the reasons you mention. What then happens is two otherwise identical people in the model are treated differently based on race.

AnthonyMouse · on Nov 27, 2017

> The issue comes when features correlated with race are used to make inferences.

Practically all features are correlated with race. Income, culture, religion, education, proximity to gangs, parental involvement, etc. etc. And many factors cause other factors, which then also correlate.

The difference between zip code as a mechanism for Bayesian inference and redlining is that redlining is disproportionate. Redlining doesn't adjust probabilities in response to data, it just outright bans anyone in a neighborhood, which overcompensates. It makes inaccurate predictions to the detriment of the people in that neighborhood, because some of the people there have mitigating factors that overcome the negatives of living there, and it doesn't take that into account. An algorithm that considers all available data does.

You still have to address the things that cause trouble for the people in that neighborhood, but those are separate problems. You have to fix them, not pretend they don't exist.

grafporno · on Nov 27, 2017

Yes, the algorithm may be correct, but garbage in, garbage out still applies, bcause of biases.

opportune · on Nov 27, 2017

Another problem in the same vein is that when ML algorithms make use of Bayesian inference they can bake in correlations ( e.g. between race and credit score) that we would normally purposefully avoid using as a factor, because while it enhances predictive power, it again codifies our existing biases, prejudices, and injustices. For example if you were deploying an ML model to determine whether someone deserved a loan, features such as ZIP code or race could encode discrimination into the model

maxerickson · on Nov 27, 2017

Including race as a parameter should reduce the impact of those correlations on the output of the model (by allowing the model to measure and control for the bias that exists in the input data).

Incautiously using race just because it reflects those existing biases would be a problem. This is what lots of humans do, overestimating the information provided by their own inferences of race. Like internet assholes who blather about how it is rational to be afraid of black men because of their higher rates of assault. Never mind that the absolute rate is still so low that there is ~0 predictive power from the race of a given individual.

azernik · on Nov 27, 2017

That's actually a very interesting idea for a technique. But not one I've ever heard of actually being used.

maxerickson · on Nov 27, 2017

It's not a technique, it's a natural outcome of statistical modeling.

It doesn't get used because people are innumerate and reactionary.

azernik · on Nov 27, 2017

The technique I'm referring to is including race as a factor, and then instructing your model to disregard that factor. Not being race-blind, but being race-aware while attempting to be neutral to race.

opportune · on Nov 27, 2017

Yeah I think this is an interesting idea but I don't actually understand how the great-grandparent comment believes that this could be done. I don't think it works in a strict Bayesian sense. You would have to go out of your way to instruct your model to operate correctively

CoongLiu · on Nov 27, 2017

By definition, any model is basically bound to be discriminatory. Taking data, extracting common key features, and discarding the rest is essentially generalisation.

But the model is amoral. It's (morally) neither good nor bad for utilising certain features.

If it turned out that race was the most accurate attribute for a particular situation, it would be nonsensical ignore it.

The current trend of trying to paper over biases, while generally born of noble sentiment, probably only perpetuates the problem. Because it's usually done far down stream, and doesn't necessitate change at the source. Functionally, it's like a cover up by a large corporation

ufo · on Nov 27, 2017

> If it turned out that race was the most accurate attribute for a particular situation, it would be nonsensical ignore it.

Only if you are optimizing for prediction accuracy. If you want to optimize for something like "justice" or "citizen wellbeing" then you might want to come to a different conclusion