Differentially private synthetic data is a promising alternative for sensitive data release. Many differentially private generative models have been proposed in the literature. Unfortunately, they all suffer from the low utility of the synthetic data, particularly for images of high resolutions. Here, we propose DPAF, an effective differentially private generative model for high-dimensional image synthesis. Different from the prior private stochastic gradient descent-based methods that add Gaussian noises in the backward phase during the model training, DPAF adds a differentially private feature aggregation in the forward phase, bringing advantages, including the reduction of information loss in gradient clipping and low sensitivity for the aggregation. Moreover, as an improper batch size has an adverse impact on the utility of synthetic data, DPAF also tackles the problem of setting a proper batch size by proposing a novel training strategy that asymmetrically trains different parts of the discriminator. We extensively evaluate different methods on multiple image datasets (up to images of 128x128 resolution) to demonstrate the performance of DPAF.
翻译:差分隐私合成数据是敏感数据发布的一种有前景的替代方案。文献中已提出多种差分隐私生成模型,但它们在合成数据的实用性方面均存在不足,尤其在处理高分辨率图像时更为明显。本文提出DPAF——一种面向高维图像合成的有效差分隐私生成模型。与先前基于隐私随机梯度下降的方法(在模型训练的反向阶段添加高斯噪声)不同,DPAF在前向阶段引入差分隐私特征聚合,具有减少梯度裁剪中的信息损失以及降低聚合敏感度等优势。此外,针对不当批处理大小对合成数据实用性的负面影响,DPAF通过提出一种新型训练策略(对判别器不同部分进行非对称训练)解决了合理设置批处理大小的问题。我们在多个图像数据集上(最高达128×128分辨率图像)广泛评估了不同方法,以验证DPAF的性能。