With the universal adoption of machine learning in healthcare, the potential for the automation of societal biases to further exacerbate health disparities poses a significant risk. We explore algorithmic fairness from the perspective of feature selection. Traditional feature selection methods identify features for better decision making by removing resource-intensive, correlated, or non-relevant features but overlook how these factors may differ across subgroups. To counter these issues, we evaluate a fair feature selection method that considers equal importance to all demographic groups. We jointly considered a fairness metric and an error metric within the feature selection process to ensure a balance between minimizing both bias and global classification error. We tested our approach on three publicly available healthcare datasets. On all three datasets, we observed improvements in fairness metrics coupled with a minimal degradation of balanced accuracy. Our approach addresses both distributive and procedural fairness within the fair machine learning context.
翻译:随着机器学习在医疗领域的广泛应用,自动化社会偏见可能进一步加剧健康差距,这构成了重大风险。我们从特征选择的角度探索算法公平性。传统特征选择方法通过移除资源密集、相关或非相关特征来优化决策,但忽略了这些因素在不同子群体中的差异性。为应对这些问题,我们评估了一种公平特征选择方法,该方法赋予所有人口群体同等重要性。我们在特征选择过程中联合考虑了公平性指标和误差指标,以平衡最小化偏差与全局分类误差。我们在三个公开医疗数据集上测试了该方法。在三个数据集上,我们观察到公平性指标有所改善,同时平衡准确率的下降幅度极小。我们的方法在公平机器学习背景下同时解决了分配公平与程序公平问题。