Recently, transductive learning methods, which leverage holdout sets during training, have gained popularity for their potential to improve speed, accuracy, and fairness in machine learning models. Despite this, the composition of the holdout set itself, particularly the balance of sensitive sub-groups, has been largely overlooked. Our experiments on CIFAR and CelebA datasets show that compositional changes in the holdout set can substantially influence fairness metrics. Imbalanced holdout sets exacerbate existing disparities, while balanced holdouts can mitigate issues introduced by imbalanced training data. These findings underline the necessity of constructing holdout sets that are both diverse and representative.
翻译:近年来,利用训练过程中保留集的转导式学习方法因其在提升机器学习模型速度、准确性和公平性方面的潜力而受到广泛关注。然而,保留集本身的构成,特别是敏感子群体的平衡性,在很大程度上被忽视了。我们在CIFAR和CelebA数据集上的实验表明,保留集构成的变化会显著影响公平性指标。不平衡的保留集会加剧现有的差异,而平衡的保留集则可以缓解由不平衡训练数据引入的问题。这些发现强调了构建既多样化又具有代表性的保留集的必要性。