FOCUS: Fairness via Agent-Awareness for Federated Learning on Heterogeneous Data

Federated learning (FL) allows agents to jointly train a global model without sharing their local data. However, due to the heterogeneous nature of local data, it is challenging to optimize or even define fairness of the trained global model for the agents. For instance, existing work usually considers accuracy equity as fairness for different agents in FL, which is limited, especially under the heterogeneous setting, since it is intuitively "unfair" to enforce agents with high-quality data to achieve similar accuracy to those who contribute low-quality data, which may discourage the agents from participating in FL. In this work, we propose a formal FL fairness definition, fairness via agent-awareness (FAA), which takes different contributions of heterogeneous agents into account. Under FAA, the performance of agents with high-quality data will not be sacrificed just due to the existence of large amounts of agents with low-quality data. In addition, we propose a fair FL training algorithm based on agent clustering (FOCUS) to achieve fairness in FL measured by FAA. Theoretically, we prove the convergence and optimality of FOCUS under mild conditions for linear and general convex loss functions with bounded smoothness. We also prove that FOCUS always achieves higher fairness in terms of FAA compared with standard FedAvg under both linear and general convex loss functions. Empirically, we show that on four FL datasets, including synthetic data, images, and texts, FOCUS achieves significantly higher fairness in terms of FAA while maintaining competitive prediction accuracy compared with FedAvg and state-of-the-art fair FL algorithms.

翻译：联邦学习（FL）允许多个智能体在不共享本地数据的情况下联合训练全局模型。然而，由于本地数据的异构性，优化甚至定义训练所得全局模型对智能体的公平性都极具挑战。例如，现有工作通常将准确率公平性视为FL中不同智能体的公平标准，但这种定义存在局限性——尤其在异构环境下，强制要求高质量数据智能体与低质量数据贡献者达到相近准确率显然"不公平"，这可能会削弱智能体参与FL的意愿。本文提出一种形式化的FL公平性定义——智能体感知公平性（FAA），该定义充分考虑异构智能体的不同贡献。在FAA框架下，高质量数据智能体的性能不会因大量低质量数据智能体的存在而受到牺牲。此外，我们提出一种基于智能体聚类的公平FL训练算法（FOCUS），以实现FAA度量的FL公平性。理论上，我们证明了在线性损失函数和具有有界光滑性的广义凸损失函数条件下，FOCUS在温和假设下的收敛性与最优性。同时证明，无论对于线性还是广义凸损失函数，FOCUS都能获得比标准FedAvg更高的FAA公平性。实验表明，在包含合成数据、图像和文本的四个FL数据集上，与FedAvg及现有最优公平FL算法相比，FOCUS在保持竞争性预测准确率的同时，实现了显著更高的FAA公平性指标。