Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(\alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(\alpha,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
翻译:隐私保护与拜占庭弹性是当代分布式机器学习的两项关键需求。尽管这两种概念已得到广泛独立研究,但如何有效融合二者仍是一个悬而未决的问题。本文通过研究标准参数服务器架构下的分布式随机梯度下降算法,在以下双重约束下仍能学习精确模型的极限能力来探索该问题:(a)部分工作节点存在恶意行为(拜占庭故障),(b)其余诚实节点在向服务器传递信息时需加入噪声以实现差分隐私。我们首先观察到,将差分隐私与拜占庭弹性的标准实践进行简单整合存在本质困难。事实上,我们证明当诚实节点强制执行差分隐私时,现有许多关于拜占庭故障下分布式SGD收敛性的结论(特别是那些依赖$(\alpha,f)$-拜占庭弹性的结论)将不再成立。为克服这一缺陷,我们重新审视$(\alpha,f)$-BR理论以获得近似收敛保证。我们的分析揭示了通过超参数优化改善该保证的关键见解。本质上,理论与实验结果表明:(1)对标准差分隐私与拜占庭弹性方法的草率结合可能导致无效结果,(2)但通过精细调整学习算法,我们可在同时保证差分隐私与拜占庭弹性的前提下获得合理的学习精度。