Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy. In particular, fine-tuned models are highly vulnerable, as they are often fine-tuned on small and private datasets. Membership inference attacks (MIAs) are used to assess privacy risks by determining whether a specific sample was part of a model's training data. Existing MIAs against diffusion models either assume obtaining the intermediate results or require auxiliary datasets for training the shadow model. In this work, we utilized a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images, resulting in residual semantic signals even at the maximum noise step. We empirically demonstrate that the fine-tuned diffusion model captures hidden correlations between the residual semantics in initial noise and the original images. Building on this insight, we propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result. Extensive experiments demonstrate that the semantic initial noise can strongly reveal membership information, highlighting the vulnerability of diffusion models to MIAs.
翻译:扩散模型在图像生成领域取得了显著进展,但其日益广泛的应用引发了严重的隐私担忧。特别是经过微调的模型极易受到攻击,因为它们通常基于小型私有数据集进行微调。成员推断攻击通过判断特定样本是否属于模型训练数据的一部分来评估隐私风险。现有针对扩散模型的成员推断攻击要么假设能够获取中间生成结果,要么需要辅助数据集来训练影子模型。在本研究中,我们利用了一个关键但被忽视的漏洞:广泛使用的噪声调度方案未能完全消除图像中的语义信息,导致即使在最大噪声步长下仍存在残余语义信号。我们通过实验证明,微调后的扩散模型能够捕捉初始噪声中残余语义与原始图像之间的隐藏关联。基于这一发现,我们提出了一种简单而有效的成员推断攻击方法,该方法将语义信息注入初始噪声,并通过分析模型生成结果来推断成员关系。大量实验表明,语义化初始噪声能够强烈揭示成员信息,凸显了扩散模型对成员推断攻击的脆弱性。