Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.
翻译:拜占庭机器学习旨在确保分布式学习算法在面对恶意(或拜占庭)机器时仍具韧性。尽管该问题已受到广泛关注,但先前研究往往假设机器持有的数据是同质的,这在实际场景中鲜有成立。数据异质性使拜占庭机器学习变得更具挑战性,因为拜占庭机器与正常离群机器难以区分。虽有少数解决方案被提出以应对此问题,但这些方案仅提供次优的概率性保证,且实际表现欠佳。本文填补了理论空白,实现了最优性并取得了良好的实证结果。实际上,我们展示了一种通过名为“最近邻混合”(NNM)的强大机制,将现有(同质)拜占庭机器学习解决方案自动适配至异质场景的方法——该机制可增强任意标准鲁棒分布式梯度下降变体,使其在异质性条件下达到最优拜占庭韧性。通过将NNM嵌入分布式随机重球法(一种实用的分布式梯度下降替代方案),我们获得了相似的保证(在期望意义上)。最终,我们的实证结果显著优于现有最优的拜占庭机器学习解决方案。