Enhancing Biases in ML Models

Models trained from historical data are always anachronistic and they reflect the past and not the present. When trying to apply them against future data, this could be very problematic.

In the case of natural language data (text), this is exacerbated by the fact that the text can be decades old. (Most syntactic parsers are trained in the Penn Tree Bank corpus of ~2,500 news stories from 1989 Wall Street Journal).

This poses significant problems as society has changed significantly from such times. Think for example that, currently, 62 percent of medical doctors under 40 in Switzerland are female.

Given this anachronistic nature in NLP data, it makes sense the data will present views about gender, religion, sexual orientation and other topics that are incorrect with out current society. Changing the data or changing the models trained over that data to correct for these wrong views is considered either useful or a moral imperative. (See for example the paper Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings by Bolukbasi and others in NeurIPS 2016.)

Now, some of the techniques devised to address this problem involve finding a direction in which the model is incorrect and the modify the model in the opposite direction. This technique is called "debiasing" the model but I find the choice of words unfortunate because the term is used in the larger, societal perspective rather than in the technical sense. This ambiguity leads to confusion. The actual operation involves changing a model away from its data, basically applying an external bias to it, that we hope it will correct the errors (data bias) present in the training data.

But the techniques designed so far are, sadly, dual-use. If we can identify this incorrect direction, we can certainly remove it but we can also amplify it. Why would we want to make a model more racist, xenophobic or sexist? Well, obviously, we wouldn't. We should therefore find better technical solutions to this problem that are not dual-use. Otherwise this solutions should be looked upon with care w.r.t. to ethical research issues.

But there is still morally positive value in making models more incorrect: convincing decision makers to let us correct the models. The argument I have heard can be paraphrased like society is sexist, so our sexist model is more accurate and we are a for-profit enterprise, we won't succumb to an unrelated moral imperative and make our system less performant. Thing is, removing the wrong direction in the data is not only a moral imperative but it has to do with dealing with the anachronistic nature of the data. If the wrong direction is enhanced, chances are the error rate will increase much more than removing it. And that shows the value in removing it, to adjust the model to the present and future.