Learning a privacy-preserving model from sensitive data which are distributed across multiple devices is an increasingly important problem. The problem is often formulated in the federated learning context, with the aim of learning a single global model while keeping the data distributed. Moreover, Bayesian learning is a popular approach for modelling, since it naturally supports reliable uncertainty estimates. However, Bayesian learning is generally intractable even with centralised non-private data and so approximation techniques such as variational inference are a necessity. Variational inference has recently been extended to the non-private federated learning setting via the partitioned variational inference algorithm. For privacy protection, the current gold standard is called differential privacy. Differential privacy guarantees privacy in a strong, mathematically clearly defined sense. In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation runs done by individual parties, and two based on perturbing updates to the global model (one using a version of federated averaging, the second one adding virtual parties to the protocol), and compare their properties both theoretically and empirically.
翻译:从分布在多个设备上的敏感数据中学习隐私保护模型是一个日益重要的问题。该问题通常在联邦学习背景下提出,目标是在保持数据分布式存储的同时学习一个单一全局模型。此外,贝叶斯学习是一种流行的建模方法,因为它自然支持可靠的不确定性估计。然而,即使对于集中式非隐私数据,贝叶斯学习通常也难以处理,因此需要变分推断等近似技术。变分推断最近通过分区变分推断算法被扩展到非隐私联邦学习场景。在隐私保护方面,当前黄金标准是差分隐私。差分隐私以强健且数学上定义明确的方式保证隐私。本文提出了差分隐私分区变分推断,这是首个在联邦学习环境下学习贝叶斯后验分布的变分逼近的通用框架,同时最小化通信轮次并为数据主体提供差分隐私保证。我们在该通用框架中提出了三种替代实现方案:一种基于扰动个体方执行的局部优化过程,另外两种基于扰动全局模型的更新(一种采用联邦平均的变体,另一种向协议中添加虚拟方),并从理论和实证两方面比较了它们的性质。