Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA - a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.
翻译:拜占庭鲁棒性因协作学习与联邦学习兴趣的增长而备受关注。然而,该领域中诸多富有前景的方向,如利用方差缩减实现鲁棒性、借助通信压缩降低通信成本,仍鲜有探索。本文填补了这一空白,提出了一种兼具方差缩减与压缩特性的新型拜占庭容错方法——Byz-VR-MARINA。本文的核心观点是:方差缩减是更有效对抗拜占庭工作节点的关键,而通信压缩则作为额外优势,使过程更具通信效率。我们推导了Byz-VR-MARINA的理论收敛保证,在一般非凸与Polyak-Lojasiewicz损失函数上超越了现有最优方法。与同期结合方差缩减和/或压缩的拜占庭鲁棒方法不同,我们的复杂度结果是紧致的,且不依赖梯度有界性或有限压缩等限制性假设。此外,我们首次分析了支持随机梯度非均匀采样的拜占庭容错方法。数值实验验证了我们的理论发现。