In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Bias mitigation methods work in different ways and have known "waterfall" effects, e.g., mitigating bias at one place may manifest bias elsewhere. In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied. To do so, we treat intervention effects as a classification task and learn an explainable meta-classifier to identify cohorts that have altered outcomes. We examine a range of bias mitigation strategies that work at various stages of the model life cycle. We empirically demonstrate that our meta-classifier is able to uncover impacted cohorts. Further, we show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e., people who receive unfavourable outcomes solely on account of mitigation efforts. This is despite improvement in fairness metrics. We use these results as a basis to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics.
翻译:在机器学习系统中,偏差缓解方法旨在使特权群体与非特权群体之间的结果更加公平。偏差缓解方法以不同方式发挥作用,并具有已知的“瀑布”效应,例如,在一个环节缓解偏差可能会在其他环节显现偏差。本文旨在描述应用缓解干预措施时受影响的群体。为此,我们将干预效果视为一项分类任务,并学习一个可解释的元分类器来识别结果发生改变的群体。我们研究了在模型生命周期各个阶段发挥作用的一系列偏差缓解策略。我们通过实验证明,我们的元分类器能够揭示受影响的群体。此外,我们表明所有经测试的缓解策略都会对非平凡比例的案例产生负面影响,即那些仅因缓解措施而获得不利结果的人群。尽管公平性指标有所改善,这一现象依然存在。我们利用这些结果作为依据,主张对静态缓解干预措施进行超越聚合指标的更审慎审计。