A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.
翻译:联邦学习中一个公认的难点源于客户端间的统计异质性:本地数据集通常来自不同但并非完全不相关的概率分布,因此,从每个个体的视角出发,个性化是实现最优结果所必需的。本文从极小极大视角出发,探讨了使用光滑强凸损失函数的个性化联邦学习超额风险如何依赖于数据异质性,重点关注FedAvg算法(McMahan et al., 2017)与纯本地训练(即客户端在本地数据集上求解经验风险最小化问题,无需任何通信)。我们的主要结果揭示了这两种联邦学习基线算法之间存在一种近似的替代关系:当数据异质性较小时,前者算法在一组实例上达到极小极大速率最优;而当数据异质性较大时,后者达到极小极大速率最优,且该阈值在常数范围内是尖锐的。由此推论,我们的结果表明,从最坏情况的角度来看,在两种基线算法中做出选择的二分策略是速率最优的。另一推论是,在附加正则性条件下,流行的FedAvg后接本地微调策略同样具有极小极大最优性。我们的分析依赖于一种考虑联邦学习特性的新算法稳定性概念。