Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
翻译:文本到图像扩散模型已被证明存在样本级记忆问题,可能再生产训练图像的近乎完美副本,这一现象并不理想。为解决该问题,我们开发了首个具备差分隐私(DP)的检索增强生成算法,该算法能在提供可证明隐私保障的同时生成高质量图像样本。具体而言,我们假设可访问一个基于少量公开数据训练的文本到图像扩散模型,并设计了一个差分隐私检索机制,通过从私有检索数据集中提取样本增强文本提示。我们的差分隐私检索增强扩散模型(DP-RDM)无需在检索数据集上微调即可适配其他领域,并能利用最先进的生成模型,在满足严格差分隐私保障的同时生成高质量图像样本。例如,在MS-COCO数据集上,当隐私预算ε=10时,DP-RDM生成的样本相较于纯公开检索方法,在高达10,000次查询中实现了FID指标3.5分的提升。