In critical machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities. In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecting labeled data prohibits the use of large, labeled datasets. In such settings, active learning promises to maximize marginal accuracy gains of small amounts of labeled data. However, existing applications of active learning for fairness fail to deliver on this, typically requiring large labeled datasets, or failing to ensure the desired fairness tolerance is met on the population distribution. To address such limitations, we introduce an innovative active learning framework that combines an exploration procedure inspired by posterior sampling with a fair classification subroutine. We demonstrate that this framework performs effectively in very data-scarce regimes, maximizing accuracy while satisfying fairness constraints with high probability. We evaluate our proposed approach using well-established real-world benchmark datasets and compare it against state-of-the-art methods, demonstrating its effectiveness in producing fair models, and improvement over existing methods.
翻译:在关键机器学习应用中,确保公平性对于避免加剧社会不平等至关重要。本研究旨在解决数据稀缺环境中减少偏差与提升准确率的挑战——在此类场景下,标注数据的采集成本阻碍了大规模标注数据集的使用。主动学习有望最大限度提升小规模标注数据的边际准确率增益,然而现有面向公平性的主动学习方法未能实现该目标:它们通常需要大规模标注数据集,或者无法确保在总体分布上达到期望的公平性容差。为突破上述局限,我们提出一种创新性主动学习框架,该框架结合了基于后验采样的探索过程与公平分类子程序。我们证明该框架在极度数据稀缺场景下表现优异,能在高概率满足公平性约束的同时最大化准确率。基于成熟的真实世界基准数据集进行评估,并与现有最优方法对比,实验结果验证了该方法在生成公平模型上的有效性及其对现有方法的改进效果。