Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport - a popular regularization method in the literature that has often been leveraged for its computational benefits - enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples. We prove that at the same time this leads to fast statistical convergence at the parametric rate. This shows that entropic regularization of optimal transport uniquely enables the mitigation of both the effects of privatization noise and the curse of dimensionality in statistical convergence. We provide experimental evidence to support the efficacy of our framework in practice.
翻译:局部差分隐私是一种强大的隐私保护数据收集方法。本文提出了一个框架,用于在差分私有化数据上训练生成对抗网络(GANs)。我们证明,最优传输的熵正则化——这一文献中常见且因其计算优势而被广泛采用的正则化方法——使得生成器仅通过访问私有化样本就能学习到原始(未私有化)数据分布。同时,我们证明这能以参数化速率实现快速统计收敛。这表明,最优传输的熵正则化能够独特地同时缓解私有化噪声的影响和统计收敛中的维度灾难。我们通过实验证据支持了该框架在实际应用中的有效性。