Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions.
翻译:机器学习可能强化数据中存在的偏差,而如我们在本工作中所论证的,也会强化数据缺失所隐含的偏差。在医疗领域,偏见在医学史上留下了深刻印记,导致对边缘化群体的不平等医疗。缺失数据中的模式往往反映了这些群体差异,但针对特定群体缺失现象的算法公平性影响尚不明确。尽管插补具有潜在影响,但它常被视为被忽视的预处理步骤,研究重点集中在重构误差和整体性能的降低上,而忽略了插补对群体的差异化影响。我们的工作研究了插补选择如何影响各群体间的重构误差以及下游预测的算法公平性属性。