We tackle the problem of bias mitigation of algorithmic decisions in a setting where both the output of the algorithm and the sensitive variable are continuous. Most of prior work deals with discrete sensitive variables, meaning that the biases are measured for subgroups of persons defined by a label, leaving out important algorithmic bias cases, where the sensitive variable is continuous. Typical examples are unfair decisions made with respect to the age or the financial status. In our work, we then propose a bias mitigation strategy for continuous sensitive variables, based on the notion of endogeneity which comes from the field of econometrics. In addition to solve this new problem, our bias mitigation strategy is a weakly supervised learning method which requires that a small portion of the data can be measured in a fair manner. It is model agnostic, in the sense that it does not make any hypothesis on the prediction model. It also makes use of a reasonably large amount of input observations and their corresponding predictions. Only a small fraction of the true output predictions should be known. This therefore limits the need for expert interventions. Results obtained on synthetic data show the effectiveness of our approach for examples as close as possible to real-life applications in econometrics.
翻译:我们解决了在算法输出和敏感变量均为连续变量的情况下,缓解算法决策偏见的问题。先前的大多数研究处理的是离散敏感变量,即针对由标签定义的子群体测量偏见,这忽略了敏感变量为连续变量的重要算法偏见情形。典型例子包括涉及年龄或财务状况的不公平决策。因此,我们提出了一种基于计量经济学中内生性概念的连续敏感变量偏见缓解策略。除了解决这一新问题外,我们的偏见缓解策略是一种弱监督学习方法,仅需少量数据能以公平方式测量。该策略是模型无关的,即不对预测模型做任何假设。它利用相当大量的输入观测及其对应预测,仅需已知少量真实输出预测,从而限制了专家干预的需求。合成数据上的结果表明,我们的方法在尽可能贴近计量经济学实际应用的示例中有效。