Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
翻译:随着数据大规模可用,机器学习算法正被部署于分布式拓扑结构中:不同节点通过各自数据协作训练模型,并与中央服务器交换与模型相关的信息(如梯度)。然而,分布式学习方案显著面临两种威胁。首先,拜占庭节点可通过向服务器发送错误信息(例如错误梯度)单方面破坏学习过程。标准缓解手段是在服务器端采用非线性鲁棒聚合方法。其次,服务器可能侵犯节点隐私。近期攻击表明,交换(未加密的)梯度使好奇的服务器能够恢复节点的全部数据。同态加密作为黄金标准的安全基元,其在非拜占庭场景下作为分布式学习的隐私保护方案已被广泛研究。然而,由于同态加密的巨大计算开销(尤其对于高维机器学习模型),目前尚无任何尝试设计用于非线性鲁棒聚合器的纯同态算子。本工作提出SABLE——首个完全同态且支持拜占庭鲁棒的分布式学习算法。SABLE的核心创新在于一种新颖的明文编码方法,使我们能够在支持批处理的BGV方案上实现鲁棒聚合器。此外,该编码方案还能以更高的安全裕度和更小的密文尺寸加速现有最优的同态排序算法。我们在图像分类任务上进行了大量实验,结果表明我们的算法在匹配非隐私对应方案的机器学习性能的同时,实现了实用的执行时间。