Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research. This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the Byzantine setting. Going further, we propose two optimization schemes to speed up the convergence. The first one is a Nesterov-type accelerated scheme whose proof directly derives from accelerated inexact gradient results applied to our formulation. The second one hinges on Optimization under Similarity, in which the server leverages an auxiliary loss function that approximates the global loss. Both approaches allow to drastically reduce the communication complexity compared to previous methods, as we show theoretically and empirically.
翻译:标准的联邦学习算法易受对抗节点(即拜占庭故障)的影响。为解决此问题,研究者开发了鲁棒的分布式学习算法,通常通过鲁棒聚合替代参数平均。尽管已有关于这些聚合操作保证(随机)梯度下降(SGD)收敛的通用条件,其分析仍较为特设化。这阻碍了更复杂鲁棒算法(如加速算法)的发展。本工作中,我们证明在标准通用假设下,拜占庭鲁棒的分布式优化可被转化为具有非精确梯度预言机(同时包含加性和乘性误差项)的通用优化问题——这是当前活跃的研究领域。例如,这使我们能直接证明:基于标准鲁棒聚合过程的梯度下降在拜占庭设置下可获得最优渐近误差。进一步地,我们提出两种加速收敛的优化方案。第一种为Nesterov型加速方案,其证明直接源自应用于本框架的加速非精确梯度理论结果。第二种方案基于相似性优化,其中服务器利用近似全局损失的辅助损失函数。理论分析与实验表明,相较于现有方法,这两种方案能显著降低通信复杂度。