Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the limited existing research on fairness in FL does not effectively address two key challenges, i.e., (CH1) Current methods fail to deal with the inconsistency between fair optimization results obtained with surrogate functions and fair classification results. (CH2) Directly aggregating local fair models does not always yield a globally fair model due to non Identical and Independent data Distributions (non-IID) among clients. To address these challenges, we propose a Wasserstein Fair Federated Learning framework, namely WassFFed. To tackle CH1, we ensure that the outputs of local models, rather than the loss calculated with surrogate functions or classification results with a threshold, remain independent of various user groups. To resolve CH2, we employ a Wasserstein barycenter calculation of all local models' outputs for each user group, bringing local model outputs closer to the global output distribution to ensure consistency between the global model and local models. We conduct extensive experiments on three real-world datasets, demonstrating that WassFFed outperforms existing approaches in striking a balance between accuracy and fairness.
翻译:联邦学习(Federated Learning, FL)采用分布式训练方法处理用户数据无法跨客户端共享的场景。由于FL中的训练数据本质上地理分布于不同用户群体之间,实现公平性至关重要。现有公平性研究主要假设能够访问全部训练数据,这使其难以直接迁移至FL场景。然而,当前有限的FL公平性研究未能有效解决两个关键挑战:(CH1)现有方法无法处理代理函数所得公平优化结果与公平分类结果之间的不一致性;(CH2)由于客户端间数据分布的非独立同分布特性,直接聚合局部公平模型并不总能产生全局公平模型。为应对这些挑战,我们提出一种基于Wasserstein距离的公平联邦学习框架WassFFed。针对CH1,我们确保局部模型的输出(而非通过代理函数计算的损失或基于阈值的分类结果)与不同用户群体保持独立。针对CH2,我们对各用户群体采用所有局部模型输出的Wasserstein重心计算,使局部模型输出更接近全局输出分布,从而保证全局模型与局部模型的一致性。我们在三个真实数据集上进行了广泛实验,结果表明WassFFed在准确性与公平性的平衡方面优于现有方法。