In your research, how do you define the process of "generalizes"? I understand t...

nil-sec · on July 9, 2020

Let me be very concrete: There is currently a lot of research being done in so called “contrastive learning”. This is an unsupervised technique in which you train a network to build good representations of its input data. Not by explicitly telling the network, “this is a dog” but by telling it “this image is different from that image and similar to this other one”.

How this works, why this works, coming up with the technique itself, are all data agnostic. All you did so far is write down a function f(x) -> y and a loss L(x,y), with x the input and y the output, specifying your model.

Of course you use a specific dataset to train your model in the end and see if it works. But the model and the technique itself are not grounded in any specific dataset and thus nothing in this model perpetuates bias.

Now usually the next step for you as a researcher is to evaluate the performance of your model on the test set. Lets take image net. Now there are 3 situations.

A) Your Train set is biased & your test set is biased in the same way.

B) Your train set is biased & your test set is not.

C) Your train set is unbiased & your test set is biased.

With biased I mean any kind of data issue such as discussed in this article, e.g. no women in the class “doctor”.

In situations B) & C) your model wont work well so you actually have an incentive to fix your data. This will happen to you in production if you train your tracker only on white people say.

Situation A) is likely what’s happening with imagenet and other benchmark datasets. In this case your model learns an incomplete representation of the class “doctor” and learns the spurious correlation that all doctors are men. This will work on the test set because it’s equally biased.

You go, get good test results and publish a paper about it, unaware of the inherent dataset biases. (You could have done all this also on MNIST or a thousand other datasets that do not have any issues with societal biases because they are from a totally different domain, but that’s another point).

In this entire process of coming up with the model, training and evaluating it, there is no point at which the researcher has any interest, or benefits from, working with biased datasets. Furthermore, besides potentially overestimating the accuracy of your model, there is nothing in here that would hurt society or further perpetuate biases. That is because models are generally not designed to work on a specific dataset.

Again, this is a different story when you use your model in production. In this case you are in situation B) or C) and here now lies the crux. If you can make money from this bias or maybe it perpetuates your own biases well you might keep it like that. This should be fixed. Here now is a real argument for why there should be diverse populations working on AI systems that are used in industry.

Of course having diverse populations in science is also our goal. But not to fix our datasets but to do better research.