Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.
翻译:去中心化学习(DL)因机器学习(ML)模型遭受复杂攻击而面临更高的隐私泄露风险。安全聚合是一种计算高效的密码学技术,允许多个参与方在不暴露各自私密输入且无需中央聚合器的情况下,计算其私有数据的聚合结果。为提升DL的通信效率,稀疏化技术被用于仅选择性共享模型中最重要的参数或梯度,从而在保持精度的同时维持效率。然而,在DL中对稀疏化模型应用安全聚合颇具挑战性,因为不同节点传输的参数集互不相交,可能导致掩码无法有效抵消。本文提出CESAR——一种专为与现有稀疏化机制兼容而设计的去中心化学习安全聚合协议。CESAR可证明地防御诚实但好奇的敌手,并能通过形式化调整以抵御敌手之间的合谋攻击。我们奠定了节点执行的稀疏化操作与CESAR协议下共享参数比例之间相互作用的基础理解,涵盖合谋与非合谋场景,为协议的工作原理及适用性提供了分析性见解。在3-正则拓扑的48节点网络上的实验表明:采用随机子采样时,CESAR的准确率始终保持在去中心化并行随机梯度下降(D-PSGD)的0.5%以内,且仅增加11%的数据开销;此外,在独立同分布(IID)数据上,其准确率可比TopK方法高出0.3%。