SoK: Enhancing Cryptographic Collaborative Learning with Differential Privacy

In collaborative learning (CL), multiple parties jointly train a machine learning model on their private datasets. However, data can not be shared directly due to privacy concerns. To ensure input confidentiality, cryptographic techniques, e.g., multi-party computation (MPC), enable training on encrypted data. Yet, even securely trained models are vulnerable to inference attacks aiming to extract memorized data from model outputs. To ensure output privacy and mitigate inference attacks, differential privacy (DP) injects calibrated noise during training. While cryptography and DP offer complementary guarantees, combining them efficiently for cryptographic and differentially private CL (CPCL) is challenging. Cryptography incurs performance overheads, while DP degrades accuracy, creating a privacy-accuracy-performance trade-off that needs careful design considerations. This work systematizes the CPCL landscape. We introduce a unified framework that generalizes common phases across CPCL paradigms, and identify secure noise sampling as the foundational phase to achieve CPCL. We analyze trade-offs of different secure noise sampling techniques, noise types, and DP mechanisms discussing their implementation challenges and evaluating their accuracy and cryptographic overhead across CPCL paradigms. Additionally, we implement identified secure noise sampling options in MPC and evaluate their computation and communication costs in WAN and LAN. Finally, we propose future research directions based on identified key observations, gaps and possible enhancements in the literature.

翻译：在协作学习（CL）中，多方基于各自的私有数据集联合训练机器学习模型。然而，由于隐私顾虑，数据无法直接共享。为确保输入机密性，密码学技术（如多方计算（MPC））支持在加密数据上进行训练。然而，即使安全训练的模型也容易受到旨在从模型输出中提取记忆数据的推理攻击。为确保输出隐私并缓解推理攻击，差分隐私（DP）在训练过程中注入校准噪声。尽管密码学与DP提供了互补的保障，但将它们高效结合以实现密码学与差分隐私协作学习（CPCL）仍具挑战性。密码学会带来性能开销，而DP会降低准确性，由此形成了需要审慎设计权衡的隐私-准确性-性能三角关系。本文系统梳理了CPCL的研究现状。我们提出了一个统一框架，该框架概括了CPCL范式中常见的各个阶段，并将安全噪声采样确定为实现CPCL的基础阶段。我们分析了不同安全噪声采样技术、噪声类型及DP机制之间的权衡，讨论了它们的实现挑战，并评估了它们在各种CPCL范式下的准确性与密码学开销。此外，我们在MPC中实现了已识别的安全噪声采样方案，并评估了其在广域网（WAN）和局域网（LAN）中的计算与通信成本。最后，基于文献中已识别的关键观察、研究空白及可能的改进方向，我们提出了未来的研究方向。