As machine learning (ML) algorithms are increasingly used in high-stakes applications, concerns have arisen that they may be biased against certain social groups. Although many approaches have been proposed to make ML models fair, they typically rely on the assumption that data distributions in training and deployment are identical. Unfortunately, this is commonly violated in practice and a model that is fair during training may lead to an unexpected outcome during its deployment. Although the problem of designing robust ML models under dataset shifts has been widely studied, most existing works focus only on the transfer of accuracy. In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. We first develop theoretical bounds on the unfairness and expected loss at deployment, and then derive sufficient conditions under which fairness and accuracy can be perfectly transferred via invariant representation learning. Guided by this, we design a learning algorithm such that fair ML models learned with training data still have high fairness and accuracy when deployment environments change. Experiments on real-world data validate the proposed algorithm. Model implementation is available at https://github.com/pth1993/FATDM.
翻译:随着机器学习算法在高风险应用中的广泛使用,人们开始担忧这些算法可能对某些社会群体产生偏见。尽管已有许多方法被提出以赋予机器学习模型公平性,但它们通常依赖于训练数据与部署数据分布相同的假设。然而,这一假设在实践中常被违反,导致训练时公平的模型在部署时可能产生意想不到的结果。尽管针对数据集偏移下鲁棒机器学习模型的设计问题已被广泛研究,但现有工作大多仅关注准确性的迁移。本文研究域泛化下公平性与准确性的联合迁移问题——测试数据可能来自从未见过的领域。我们首先建立了部署时非公平性与期望损失的理论界限,进而推导了通过不变表征学习完美迁移公平性与准确性的充分条件。基于此理论指导,我们设计了一种学习算法,使得训练公平的机器学习模型在部署环境变化时仍能保持较高的公平性与准确性。真实世界数据的实验验证了所提算法的有效性。模型实现代码见 https://github.com/pth1993/FATDM。