We investigate the prominent class of fair representation learning methods for bias mitigation. Using causal reasoning to define and formalise different sources of dataset bias, we reveal important implicit assumptions inherent to these methods. We prove fundamental limitations on fair representation learning when evaluation data is drawn from the same distribution as training data and run experiments across a range of medical modalities to examine the performance of fair representation learning under distribution shifts. Our results explain apparent contradictions in the existing literature and reveal how rarely considered causal and statistical aspects of the underlying data affect the validity of fair representation learning. We raise doubts about current evaluation practices and the applicability of fair representation learning methods in performance-sensitive settings. We argue that fine-grained analysis of dataset biases should play a key role in the field moving forward.
翻译:本研究探讨了用于缓解偏见的公平表征学习方法这一重要类别。通过因果推理来定义和形式化数据集偏差的不同来源,我们揭示了这些方法固有的重要隐含假设。我们证明了当评估数据与训练数据来自相同分布时,公平表征学习存在根本性局限,并在一系列医学模态上进行了实验,以检验分布偏移下公平表征学习的性能。我们的结果解释了现有文献中明显的矛盾,并揭示了底层数据中极少被考虑的因果与统计特性如何影响公平表征学习的有效性。我们对当前的评估实践以及公平表征学习方法在性能敏感场景下的适用性提出了质疑。我们认为,对数据集偏差进行细粒度分析应成为该领域未来发展的关键。