Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
翻译:由于数据的大规模可用性,机器学习算法正被部署在分布式拓扑结构中,其中不同节点通过交换模型相关信息(如梯度)与中央服务器协作,在其各自数据上训练机器学习模型。然而,分布式学习方案明显易受两种威胁。首先,拜占庭节点可单独通过向服务器发送错误信息(如错误梯度)来破坏学习过程。缓解此类行为的标准方法是在服务器端使用非线性鲁棒聚合方法。其次,服务器可能侵犯节点的隐私。近期攻击表明,交换未加密的梯度能使好奇的服务器恢复节点数据的全部内容。同态加密作为黄金标准安全原语,其在非拜占庭场景中作为分布式学习的隐私保护解决方案已被广泛研究。然而,由于同态加密计算需求大(尤其针对高维机器学习模型),目前尚未有尝试设计用于非线性鲁棒聚合器的纯同态算子。在本工作中,我们提出SABLE——首个完全同态且具拜占庭鲁棒性的分布式学习算法。SABLE本质上依赖一种新颖的明文编码方法,使我们能在支持批处理的BGV方案上实现鲁棒聚合器。此外,该编码方案还能以更大安全边际和更小密文尺寸加速现有最优同态排序。我们在图像分类任务上进行了大量实验,结果表明我们的算法在匹配非隐私对应版本机器学习性能的同时,实现了实际可行的执行时间。