Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barriers on comparing, analyzing, and improving the rules across performance criteria. This paper studies near-optimal aggregation rules using clustering in the presence of outliers. Our outlier-robust clustering approach utilizes geometric properties of the update vectors provided by workers. Our analysis show that constant approximations to the 1-center and 1-mean clustering problems with outliers provide near-optimal resilient aggregators for metric-based criteria, which have been proven to be crucial in the homogeneous and heterogeneous cases respectively. In addition, we discuss two contradicting types of attacks under which no single aggregation rule is guaranteed to improve upon the naive average. Based on the discussion, we propose a two-phase resilient aggregation framework. We run experiments for image classification using a non-convex loss function. The proposed algorithms outperform previously known aggregation rules by a large margin with both homogeneous and heterogeneous data distributions among non-faulty workers. Code and appendix are available at https://github.com/jerry907/AAAI24-RASHB.
翻译:拜占庭式机器学习因大规模分布式学习系统中可能发生的不可预测故障而受到广泛关注。实现分布式系统中针对拜占庭机器的安全鲁棒性,关键在于鲁棒聚合机制。尽管已有大量鲁棒聚合规则被提出,但这些规则多以特设方式设计,给跨性能标准进行比较、分析和改进规则带来了额外障碍。本文研究了存在离群点时利用聚类的近最优聚合规则。我们的离群鲁棒聚类方法利用了工作节点提供的更新向量的几何特性。分析表明,针对带离群点的1-中心和1-均值聚类问题的常数近似解,可分别提供同质与异质场景下基于度量标准的关键近最优鲁棒聚合器。此外,我们讨论了两类相互矛盾的攻击类型,在此类攻击下,任何单一聚合规则都无法保证优于朴素平均方法。基于此讨论,我们提出了一种两阶段鲁棒聚合框架。我们使用非凸损失函数进行图像分类实验。所提算法在非故障工作节点间数据分布同质与异质的情况下,均大幅优于已知聚合规则。代码与附录见https://github.com/jerry907/AAAI24-RASHB。