Generative diffusion models, including Stable Diffusion and Midjourney, can generate visually appealing, diverse, and high-resolution images for various applications. These models are trained on billions of internet-sourced images, raising significant concerns about the potential unauthorized use of copyright-protected images. In this paper, we examine whether it is possible to determine if a specific image was used in the training set, a problem known in the cybersecurity community and referred to as a membership inference attack. Our focus is on Stable Diffusion, and we address the challenge of designing a fair evaluation framework to answer this membership question. We propose a methodology to establish a fair evaluation setup and apply it to Stable Diffusion, enabling potential extensions to other generative models. Utilizing this evaluation setup, we execute membership attacks (both known and newly introduced). Our research reveals that previously proposed evaluation setups do not provide a full understanding of the effectiveness of membership inference attacks. We conclude that the membership inference attack remains a significant challenge for large diffusion models (often deployed as black-box systems), indicating that related privacy and copyright issues will persist in the foreseeable future.
翻译:生成式扩散模型(包括Stable Diffusion和Midjourney)能够为各类应用生成视觉美观、多样且高分辨率的图像。这些模型基于数十亿张互联网来源的图像进行训练,引发了对受版权保护图像可能被未经授权使用的重大关切。本文探讨是否可能判定特定图像是否被用于训练集——这一网络安全领域的问题被称为成员推理攻击。我们聚焦于Stable Diffusion,并设计了一个公平的评估框架来回答这一成员性问题。我们提出一种构建公平评估设置的方法论,并将其应用于Stable Diffusion,从而支持向其他生成式模型的潜在扩展。利用该评估设置,我们执行了成员攻击(包括已知攻击和新引入的攻击)。研究发现,先前提出的评估设置无法全面理解成员推理攻击的有效性。我们得出结论:成员推理攻击仍是大规模扩散模型(通常部署为黑盒系统)面临的重大挑战,表明相关的隐私与版权问题在可预见的未来将持续存在。