Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
翻译:由于数据的大规模可用性,机器学习算法正被部署在分布式拓扑结构中,其中不同节点通过中央服务器交换模型相关信息(如梯度)来协作训练其各自数据的机器学习模型。然而,分布式学习方案明显易受两种威胁。首先,拜占庭节点可单方面通过向服务器发送错误信息(例如错误梯度)来破坏学习过程。缓解此类行为的标准方法是在服务器端采用非线性鲁棒聚合方法。其次,服务器可能侵犯节点的隐私。近期攻击表明,交换未加密的梯度使好奇的服务器能够恢复节点的全部数据。同态加密作为一种黄金标准的安全原语,已被广泛研究作为非拜占庭场景下分布式学习的隐私保护解决方案。然而,由于同态加密巨大的计算需求(特别是针对高维机器学习模型),目前尚未有尝试设计纯同态的非线性鲁棒聚合算子。在本工作中,我们提出SABLE——首个完全同态且能抵抗拜占庭攻击的分布式学习算法。SABLE本质上依赖一种新颖的明文编码方法,该方法使我们能够在支持批处理的BGV方案上实现鲁棒聚合器。此外,该编码方案还能以更大的安全边界和更小的密文尺寸加速现有最先进的同态排序。我们针对图像分类任务进行了大量实验,结果表明我们的算法在匹配非隐私对应方案的机器学习性能的同时,实现了实用的执行时间。