Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting, which assigns a weight to each data point used during model training, can mitigate such bias, though sometimes at the cost of predictive accuracy. In this paper, we investigated this trade-off by comparing three methods for generating these weights: (1) evolving them using a Genetic Algorithm (GA), (2) computing them using only dataset characteristics, and (3) assigning equal weights to all data points. Model performance under each strategy was evaluated using paired predictive and fairness metrics. We used two predictive metrics (accuracy and area under the Receiver Operating Characteristic curve) and two fairness metrics (demographic parity and subgroup false negative fairness). By conducting experiments on eleven publicly available datasets (including two medical datasets), we show that evolved sample weights can produce models that achieve better trade-offs between fairness and predictive performance than alternative weighting methods. However, the magnitude of these benefits depends strongly on the choice of fairness objective. Our experiments reveal that the evolved weights were most effective when optimizing for demographic parity-independent of choice of the performance objective-yielding better performance than other weighting strategies on the largest number of datasets.
翻译:在真实世界数据上训练的机器学习模型可能会无意中产生对边缘化群体产生负面影响的偏差预测。重加权(为模型训练过程中使用的每个数据点分配权重)可以缓解这种偏差,尽管有时会牺牲预测准确性。在本文中,我们通过比较三种生成这些权重的方法来研究这种权衡:(1)使用遗传算法(GA)进化权重,(2)仅使用数据集特征计算权重,以及(3)为所有数据点分配相等权重。每种策略下的模型性能使用配对预测指标和公平性指标进行评估。我们使用了两个预测指标(准确性和受试者工作特征曲线下面积)和两个公平性指标(人口统计均等性和子组假阴性公平性)。通过在十一个公开数据集(包括两个医学数据集)上进行实验,我们表明进化样本权重可以产生比替代加权方法在公平性与预测性能之间实现更好权衡的模型。然而,这些益处的程度在很大程度上取决于公平性目标的选择。我们的实验揭示,在优化人口统计均等性时——与性能目标的选择无关——进化权重最为有效,在数量最多的数据集上比其他加权策略获得了更好的性能。