Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.

翻译：数据隐私问题日益受到关注，这给协作神经网络训练带来了重大挑战——数据所有权与模型训练/部署职责分属不同实体。学术界针对该挑战已提出多种解决方案，包括联邦学习（FL）以及基于同态加密（HE）和安全多方计算（MPC）等密码学工具的隐私保护机器学习方法。然而，FL完全忽视模型隐私保护，而HE的扩展性受限（仅支持单一数据提供方）。当前最先进的MPC框架虽能提供可观的吞吐量并同时保障模型/数据隐私，但其依赖于计算服务器不共谋的关键假设，而放宽该假设仍是未解难题。本文提出Pencil——首个无需共谋假设、同时实现数据隐私保护、模型隐私保护及多数据提供方可扩展性的协作学习私有训练框架。我们的核心设计原则是：基于高效的两方协议构建多方协作训练协议，同时确保模型训练过程中切换不同数据提供方不会产生额外开销。我们引入多项新型密码学协议来实现该设计原则，并进行了严格的安全与隐私分析。对Pencil的全面评估表明：（1）明文训练模型与使用Pencil私有训练的模型在测试准确率上几乎一致；（2）Pencil的训练开销大幅降低：相较于现有技术，吞吐量提升10~260倍，通信量降低两个数量级；（3）Pencil对现有攻击和自适应（白盒）攻击均具有鲁棒性。

相关内容