Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training efficiency, it faces critical challenges related to Byzantine robustness and communication reduction. Existing Byzantine-robust and communication-efficient methods rely on full gradient information either at every iteration or at certain iterations with a probability, and they only converge to an unnecessarily large neighborhood around the solution. Motivated by these issues, we propose a novel Byzantine-robust and communication-efficient stochastic distributed learning method that imposes no requirements on batch size and converges to a smaller neighborhood around the optimal solution than all existing methods, aligning with the theoretical lower bound. Our key innovation is leveraging Polyak Momentum to mitigate the noise caused by both biased compressors and stochastic gradients, thus defending against Byzantine workers under information compression. We provide proof of tight complexity bounds for our algorithm in the context of non-convex smooth loss functions, demonstrating that these bounds match the lower bounds in Byzantine-free scenarios. Finally, we validate the practical significance of our algorithm through an extensive series of experiments, benchmarking its performance on both binary classification and image classification tasks.
翻译:分布式学习已成为跨私有数据孤岛训练大规模机器学习模型的标准方法。尽管分布式学习增强了隐私保护与训练效率,但其仍面临拜占庭鲁棒性与通信开销削减两大关键挑战。现有的拜占庭鲁棒且通信高效的方法依赖于每次迭代或按概率在特定迭代中获取完整梯度信息,且仅收敛至解周围不必要的较大邻域。针对这些问题,我们提出了一种新颖的拜占庭鲁棒且通信高效的随机分布式学习方法,该方法对批量大小无要求,且能收敛至比现有所有方法更接近最优解的邻域,与理论下界保持一致。我们的核心创新在于利用Polyak动量来缓解由有偏压缩器和随机梯度共同引入的噪声,从而在信息压缩场景下防御拜占庭工作节点。我们在非凸光滑损失函数背景下给出了算法紧复杂度界的证明,表明这些界在无拜占庭场景下与下界匹配。最后,我们通过大量实验验证了算法的实际意义,在二分类和图像分类任务上对其性能进行了基准测试。