Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.
翻译:去中心化学习(DL)因针对机器学习(ML)模型的复杂攻击而面临更高的隐私泄露风险。安全聚合是一种计算高效的密码学技术,允许多方计算其私有数据的聚合结果,同时彼此间及与任何中央聚合器均隐藏各自的输入。为提升DL的通信效率,稀疏化技术被采用,仅选择性共享模型中最关键的参数或梯度,从而在不大幅降低精度的前提下保持效率。然而,在DL中对稀疏化模型应用安全聚合颇具挑战,因为不同节点传输的是不相交的参数集,这可能阻碍掩码的有效抵消。本文提出CESAR,一种新颖的、专为与现有稀疏化机制兼容而设计的DL安全聚合协议。CESAR可证明地防御诚实但好奇的对手,并能形式上地调整以对抗它们之间的串通。我们提供了关于节点进行的稀疏化与CESAR下共享参数的占比在串通与非串通环境中交互作用的基础性理解,从而为协议的工作机理与适用性提供分析性见解。在3-正则拓扑的48节点网络上的实验表明,采用随机子采样时,CESAR的精度始终在去中心化并行随机梯度下降(D-PSGD)的0.5%以内,且仅增加11%的数据开销。此外,在独立同分布(IID)数据上,CESAR的精度比TopK高出达0.3%。