Personalizing large-scale diffusion models poses serious privacy risks, especially when adapting to small, sensitive datasets. A common approach is to fine-tune the model using differentially private stochastic gradient descent (DP-SGD), but this suffers from severe utility degradation due to the high noise needed for privacy, particularly in the small data regime. We propose an alternative that leverages Textual Inversion (TI), which learns an embedding vector for an image or set of images, to enable adaptation under differential privacy (DP) constraints. Our approach, Differentially Private Aggregation via Textual Inversion (DPAgg-TI), adds calibrated noise to the aggregation of per-image embeddings to ensure formal DP guarantees while preserving high output fidelity. We show that DPAgg-TI outperforms DP-SGD finetuning in both utility and robustness under the same privacy budget, achieving results closely matching the non-private baseline on style adaptation tasks using private artwork from a single artist and Paris 2024 Olympic pictograms. In contrast, DP-SGD fails to generate meaningful outputs in this setting.
翻译:个性化大规模扩散模型会带来严重的隐私风险,特别是在适应小型敏感数据集时。一种常见的方法是使用差分隐私随机梯度下降(DP-SGD)对模型进行微调,但由于隐私保护所需的高噪声,尤其是在小数据场景下,这种方法会遭受严重的效用退化。我们提出了一种替代方案,该方案利用文本反演(Textual Inversion, TI)——为单张图像或图像集学习一个嵌入向量——以实现在差分隐私(DP)约束下的自适应。我们的方法,即通过文本反演进行差分隐私聚合(DPAgg-TI),通过对每张图像的嵌入向量聚合过程添加校准噪声,在确保形式化DP保证的同时,保持高输出保真度。我们证明,在相同的隐私预算下,DPAgg-TI在效用和鲁棒性方面均优于DP-SGD微调,在使用来自单个艺术家的私有艺术作品和巴黎2024奥运会象形图进行的风格适应任务上,其效果与非私有基线结果非常接近。相比之下,DP-SGD在此设置下无法生成有意义的输出。