It's better to move towards a more open and transparent system that's data driven. However if the system used here will be transparent isn't clear, only that it can be. The builders of these systems can choose not to share the algorithms behind it.
A podcast I listened to a day or two ago [1]; You Are Not So Smart, discussed something related to this that I feel is important to point out here as well.
It's about how we transfer our biases into the algorithms and ML solutions that we build. Given the move towards using algorithms for making decisions like this, it's something we should definitely consider.
If you have the time, definitely listen to this episode. It's an amazing podcast, but this one episode really hit me about how software can affect peoples lives – and not always for the better. How we as software engineers should be more aware of how the solutions we build will be used down-the-line.
tl;dr of the episode would be:
ML solutions, and algorithms designed by humans, are designed by looking at historic data. Historically, a number of races have not been treated fairly when it comes to the justice system, e.g. black people being treated harshly for the same crime as non-black people.
When our ML solutions are built on historical data, it learns those biases as well. Which means that the "racism" is also built into the algorithm as well. Of course the algorithm doesn't have any concept of racism, it's just another feature it uses to compute it's decision. But it is something we as designers of algorithms should keep in mind.
> When our ML solutions are built on historical data, it learns those biases as well.
It's actually worse than that, because the algorithm is giving you the correct result.
For example, suppose that black men are more likely to fail to appear in part because they have worse, less flexible jobs because of racist hiring practices.
That isn't fair, but it's still true that they're more likely to fail to appear.
Are we supposed to give bail to people we know aren't going to show up because the reason they won't show up isn't fair?
That seems like an obviously bad idea, where the good idea is to do something about racist hiring practices so that doesn't happen anymore. At which point the algorithm will see the new data and un-bias itself.
The issue comes when features correlated with race are used to make inferences. This can occur in complex ways but a simple example, used elsewhere on this thread, is zip code. It might be that people from a mainly black area are more likely to exhibit whatever negative behaviour the model is looking for than a mainly white area, perhaps for the reasons you mention. What then happens is two otherwise identical people in the model are treated differently based on race.
> The issue comes when features correlated with race are used to make inferences.
Practically all features are correlated with race. Income, culture, religion, education, proximity to gangs, parental involvement, etc. etc. And many factors cause other factors, which then also correlate.
The difference between zip code as a mechanism for Bayesian inference and redlining is that redlining is disproportionate. Redlining doesn't adjust probabilities in response to data, it just outright bans anyone in a neighborhood, which overcompensates. It makes inaccurate predictions to the detriment of the people in that neighborhood, because some of the people there have mitigating factors that overcome the negatives of living there, and it doesn't take that into account. An algorithm that considers all available data does.
You still have to address the things that cause trouble for the people in that neighborhood, but those are separate problems. You have to fix them, not pretend they don't exist.
Another problem in the same vein is that when ML algorithms make use of Bayesian inference they can bake in correlations ( e.g. between race and credit score) that we would normally purposefully avoid using as a factor, because while it enhances predictive power, it again codifies our existing biases, prejudices, and injustices. For example if you were deploying an ML model to determine whether someone deserved a loan, features such as ZIP code or race could encode discrimination into the model
Including race as a parameter should reduce the impact of those correlations on the output of the model (by allowing the model to measure and control for the bias that exists in the input data).
Incautiously using race just because it reflects those existing biases would be a problem. This is what lots of humans do, overestimating the information provided by their own inferences of race. Like internet assholes who blather about how it is rational to be afraid of black men because of their higher rates of assault. Never mind that the absolute rate is still so low that there is ~0 predictive power from the race of a given individual.
The technique I'm referring to is including race as a factor, and then instructing your model to disregard that factor. Not being race-blind, but being race-aware while attempting to be neutral to race.
Yeah I think this is an interesting idea but I don't actually understand how the great-grandparent comment believes that this could be done. I don't think it works in a strict Bayesian sense. You would have to go out of your way to instruct your model to operate correctively
By definition, any model is basically bound to be discriminatory. Taking data, extracting common key features, and discarding the rest is essentially generalisation.
But the model is amoral. It's (morally) neither good nor bad for utilising certain features.
If it turned out that race was the most accurate attribute for a particular situation, it would be nonsensical ignore it.
The current trend of trying to paper over biases, while generally born of noble sentiment, probably only perpetuates the problem. Because it's usually done far down stream, and doesn't necessitate change at the source. Functionally, it's like a cover up by a large corporation
> If it turned out that race was the most accurate attribute for a particular situation, it would be nonsensical ignore it.
Only if you are optimizing for prediction accuracy. If you want to optimize for something like "justice" or "citizen wellbeing" then you might want to come to a different conclusion
A podcast I listened to a day or two ago [1]; You Are Not So Smart, discussed something related to this that I feel is important to point out here as well.
It's about how we transfer our biases into the algorithms and ML solutions that we build. Given the move towards using algorithms for making decisions like this, it's something we should definitely consider.
If you have the time, definitely listen to this episode. It's an amazing podcast, but this one episode really hit me about how software can affect peoples lives – and not always for the better. How we as software engineers should be more aware of how the solutions we build will be used down-the-line.
tl;dr of the episode would be:
ML solutions, and algorithms designed by humans, are designed by looking at historic data. Historically, a number of races have not been treated fairly when it comes to the justice system, e.g. black people being treated harshly for the same crime as non-black people.
When our ML solutions are built on historical data, it learns those biases as well. Which means that the "racism" is also built into the algorithm as well. Of course the algorithm doesn't have any concept of racism, it's just another feature it uses to compute it's decision. But it is something we as designers of algorithms should keep in mind.
[1] https://youarenotsosmart.com/2017/11/20/yanss-115-how-we-tra...