Federated learning faces challenges due to the heterogeneity in data volumes and distributions at different clients, which can compromise model generalization ability to various distributions. Existing approaches to address this issue based on group distributionally robust optimization (GDRO) often lead to high communication and sample complexity. To this end, this work introduces algorithms tailored for communication-efficient Federated Group Distributionally Robust Optimization (FGDRO). Our contributions are threefold: Firstly, we introduce the FGDRO-CVaR algorithm, which optimizes the average top-K losses while reducing communication complexity to $O(1/\epsilon^4)$, where $\epsilon$ denotes the desired precision level. Secondly, our FGDRO-KL algorithm is crafted to optimize KL regularized FGDRO, cutting communication complexity to $O(1/\epsilon^3)$. Lastly, we propose FGDRO-KL-Adam to utilize Adam-type local updates in FGDRO-KL, which not only maintains a communication cost of $O(1/\epsilon^3)$ but also shows potential to surpass SGD-type local steps in practical applications. The effectiveness of our algorithms has been demonstrated on a variety of real-world tasks, including natural language processing and computer vision.
翻译:联邦学习面临不同客户端数据量和分布异质性带来的挑战,这会损害模型对不同分布的泛化能力。现有基于分组分布鲁棒优化(GDRO)的解决方法通常导致较高的通信和样本复杂度。为此,本研究提出了专为通信高效的联邦分组分布鲁棒优化(FGDRO)设计的算法。我们的贡献包括三个方面:首先,我们提出了FGDRO-CVaR算法,该算法优化平均前K个损失,同时将通信复杂度降低至$O(1/\epsilon^4)$,其中$\epsilon$表示期望精度水平。其次,我们的FGDRO-KL算法专为优化KL正则化FGDRO而设计,将通信复杂度削减至$O(1/\epsilon^3)$。最后,我们提出FGDRO-KL-Adam算法,在FGDRO-KL中采用Adam型局部更新,不仅保持了$O(1/\epsilon^3)$的通信成本,而且在实际应用中展现出超越SGD型局部更新步骤的潜力。我们算法的有效性已在包括自然语言处理和计算机视觉在内的多种现实任务中得到验证。