Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
翻译:深度神经网络具有记忆底层训练数据的强大能力,这可能引发严重的隐私问题。解决该问题的有效方案是采用差分隐私训练模型,该方法通过向梯度注入随机噪声来提供严格的隐私保证。本文关注敏感数据分布于多个参与方之间的场景,各方通过联邦学习(FL)联合训练模型,同时使用安全多方计算(MPC)确保各梯度更新的机密性,并采用差分隐私避免最终模型中的数据泄露。该场景下的主要挑战在于:深度学习领域常用的差分隐私实施机制需注入实值噪声,而MPC要求参与方之间交换有限域整数,二者存在根本性不兼容。因此,现有大多数差分隐私机制需要注入较高噪声,导致模型效用严重下降。受此启发,我们提出Skellam混合机制(SMM),该方法可在联邦学习构建的模型上实施差分隐私。相较于现有方法,SMM消除了输入梯度必须为整数值的假设,从而减少了为保护差分隐私所需注入的噪声量。此外,得益于Skellam分布良好的组合特性与子采样特性——这两者是实现精确差分隐私深度学习的关键——SMM支持严格的隐私核算。SMM的理论分析具有高度复杂性,尤其考虑到:(i)差分隐私深度学习的数学框架本身较为复杂;(ii)两个Skellam分布的混合形式极为繁复,据我们所知,该问题尚未在差分隐私研究领域得到深入探讨。在不同实际场景下的广泛实验表明,SMM在最终模型效用方面持续且显著优于现有解决方案。