Fair machine learning methods seek to train models that balance model performance across demographic subgroups defined over sensitive attributes like race and gender. Although sensitive attributes are typically assumed to be known during training, they may not be available in practice due to privacy and other logistical concerns. Recent work has sought to train fair models without sensitive attributes on training data. However, these methods need extensive hyper-parameter tuning to achieve good results, and hence assume that sensitive attributes are known on validation data. However, this assumption too might not be practical. Here, we propose Antigone, a framework to train fair classifiers without access to sensitive attributes on either training or validation data. Instead, we generate pseudo sensitive attributes on the validation data by training a biased classifier and using the classifier's incorrectly (correctly) labeled examples as proxies for minority (majority) groups. Since fairness metrics like demographic parity, equal opportunity and subgroup accuracy can be estimated to within a proportionality constant even with noisy sensitive attribute information, we show theoretically and empirically that these proxy labels can be used to maximize fairness under average accuracy constraints. Key to our results is a principled approach to select the hyper-parameters of the biased classifier in a completely unsupervised fashion (meaning without access to ground truth sensitive attributes) that minimizes the gap between fairness estimated using noisy versus ground-truth sensitive labels.
翻译:公平机器学习方法旨在训练模型,以平衡在种族和性别等敏感属性定义的人口统计子群体间的性能表现。尽管通常假设训练过程中已知敏感属性,但在实践中由于隐私及其他后勤问题,这些属性可能无法获得。近期工作尝试在训练数据无敏感属性的情况下训练公平模型,但这些方法需要大量超参数调优才能获得良好效果,因此假设验证数据中包含敏感属性。然而,这一假设同样可能不切实际。本文提出Antigone框架,该框架无需访问训练或验证数据的敏感属性即可训练公平分类器。我们通过训练一个偏置分类器,并将其错误(正确)标注的样本作为少数(多数)群体的代理,从而在验证数据上生成伪敏感属性。由于即使含噪声的敏感属性信息也能按比例常数估计人口均等、均等机会和子群体准确率等公平性指标,我们从理论和实验两方面证明:这些代理标签可用于在平均准确率约束下最大化公平性。该成果的核心在于提出一种完全无监督的方法(即无需访问真实敏感属性)来原则性地选择偏置分类器的超参数,从而最小化使用噪声敏感标签与真实敏感标签所估计公平性之间的差距。