Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.
翻译:协作学习算法(如分布式SGD,简称D-SGD)易受故障机器的影响,这些机器可能因软件或硬件错误、数据中毒或恶意行为而偏离预设算法。尽管已有许多方法旨在增强D-SGD对此类机器的鲁棒性,但先前的工作要么依赖强假设(可信服务器、同质数据、特定噪声模型),要么引入比D-SGD高数个数量级的梯度计算成本。我们提出MoNNA算法,该算法(a)在标准假设下具有可证明的鲁棒性,且(b)其梯度计算开销与故障机器占比呈线性关系,这一上界被推测为紧的。本质上,MoNNA分别采用局部梯度的Polyak动量进行本地更新,以及最近邻平均(NNA)进行全局混合。尽管MoNNA实现相当简单,但其分析更具挑战性,并依赖于两个可能具有独立价值的核心要素。具体而言,我们引入$(\alpha, \lambda)$-缩减的混合准则来分析非故障机器的非线性混合,并提出一种控制动量与模型漂移之间张力的方法。我们通过图像分类实验验证了理论,并将代码开源在https://github.com/LPD-EPFL/robust-collaborative-learning。