Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy

Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.

翻译：深度神经网络具有记忆底层训练数据的强大能力，这可能引发严重的隐私问题。解决该问题的有效方案是采用差分隐私训练模型，该方法通过向梯度注入随机噪声来提供严格的隐私保证。本文关注敏感数据分布于多个参与方之间的场景，各方通过联邦学习（FL）联合训练模型，同时使用安全多方计算（MPC）确保各梯度更新的机密性，并采用差分隐私避免最终模型中的数据泄露。该场景下的主要挑战在于：深度学习领域常用的差分隐私实施机制需注入实值噪声，而MPC要求参与方之间交换有限域整数，二者存在根本性不兼容。因此，现有大多数差分隐私机制需要注入较高噪声，导致模型效用严重下降。受此启发，我们提出Skellam混合机制（SMM），该方法可在联邦学习构建的模型上实施差分隐私。相较于现有方法，SMM消除了输入梯度必须为整数值的假设，从而减少了为保护差分隐私所需注入的噪声量。此外，得益于Skellam分布良好的组合特性与子采样特性——这两者是实现精确差分隐私深度学习的关键——SMM支持严格的隐私核算。SMM的理论分析具有高度复杂性，尤其考虑到：（i）差分隐私深度学习的数学框架本身较为复杂；（ii）两个Skellam分布的混合形式极为繁复，据我们所知，该问题尚未在差分隐私研究领域得到深入探讨。在不同实际场景下的广泛实验表明，SMM在最终模型效用方面持续且显著优于现有解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日