Byzantine-robust learning has emerged as a prominent fault-tolerant distributed machine learning framework. However, most techniques focus on the static setting, wherein the identity of Byzantine workers remains unchanged throughout the learning process. This assumption fails to capture real-world dynamic Byzantine behaviors, which may include intermittent malfunctions or targeted, time-limited attacks. Addressing this limitation, we propose DynaBRO -- a new method capable of withstanding any sub-linear number of identity changes across rounds. Specifically, when the number of such changes is $\mathcal{O}(\sqrt{T})$ (where $T$ is the total number of training rounds), DynaBRO nearly matches the state-of-the-art asymptotic convergence rate of the static setting. Our method utilizes a multi-level Monte Carlo (MLMC) gradient estimation technique applied at the server to robustly aggregated worker updates. By additionally leveraging an adaptive learning rate, we circumvent the need for prior knowledge of the fraction of Byzantine workers.
翻译:拜占庭鲁棒学习已成为一种重要的容错分布式机器学习框架。然而,现有技术大多关注静态场景,即拜占庭工作者的身份在整个学习过程中保持不变。这一假设未能反映现实世界中动态的拜占庭行为,例如间歇性故障或有时限的针对性攻击。为克服此局限,我们提出了DynaBRO——一种能够抵御训练轮次间任意亚线性数量身份变化的新方法。具体而言,当此类变化数量为$\mathcal{O}(\sqrt{T})$时(其中$T$为训练总轮数),DynaBRO几乎能达到静态场景下的最优渐近收敛速率。本方法在服务器端采用多层蒙特卡洛(MLMC)梯度估计技术,对工作者更新进行鲁棒聚合。通过进一步结合自适应学习率,我们避免了对拜占庭工作者比例先验知识的需求。