Traditional data masking techniques such as anonymization cannot achieve the expected privacy protection while ensuring data utility for privacy-preserving machine learning. Synthetic data plays an increasingly important role as it generates a large number of training samples and prevents information leakage in real data. The existing methods suffer from the repeating trade-off processes between privacy and utility. We propose a novel framework for differential privacy generation, which employs an Error Feedback Stochastic Gradient Descent(EFSGD) method and introduces a reconstruction loss and noise injection mechanism into the training process. We generate images with higher quality and usability under the same privacy budget as the related work. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both grayscale and RGB images. We achieve state-of-the-art results over almost all metrics on three benchmarks: MNIST, Fashion-MNIST, and CelebA.
翻译:传统的数据掩码技术(如匿名化)在确保隐私保护机器学习数据效用的同时,难以实现预期的隐私保护效果。合成数据通过生成大量训练样本并防止真实数据的信息泄露,正发挥着日益重要的作用。现有方法在隐私性与效用性之间往往需要反复权衡。本文提出一种新颖的差分隐私生成框架,该框架采用误差反馈随机梯度下降(EFSGD)方法,并在训练过程中引入重建损失与噪声注入机制。在相同隐私预算下,相比相关研究,本方法生成的图像具有更高的质量与可用性。大量实验证明,所提框架对灰度图像与RGB图像均具有显著的有效性与泛化能力。在MNIST、Fashion-MNIST和CelebA三个基准数据集上,本方法在几乎所有指标上均取得了最先进的性能。