Generative diffusion models, including Stable Diffusion and Midjourney, can generate visually appealing, diverse, and high-resolution images for various applications. These models are trained on billions of internet-sourced images, raising significant concerns about the potential unauthorized use of copyright-protected images. In this paper, we examine whether it is possible to determine if a specific image was used in the training set, a problem known in the cybersecurity community and referred to as a membership inference attack. Our focus is on Stable Diffusion, and we address the challenge of designing a fair evaluation framework to answer this membership question. We propose a methodology to establish a fair evaluation setup and apply it to Stable Diffusion, enabling potential extensions to other generative models. Utilizing this evaluation setup, we execute membership attacks (both known and newly introduced). Our research reveals that previously proposed evaluation setups do not provide a full understanding of the effectiveness of membership inference attacks. We conclude that the membership inference attack remains a significant challenge for large diffusion models (often deployed as black-box systems), indicating that related privacy and copyright issues will persist in the foreseeable future.
翻译:生成式扩散模型,包括Stable Diffusion和Midjourney,能够生成视觉上令人愉悦、多样且高分辨率的图像,适用于多种应用场景。这些模型基于数十亿张互联网来源的图像进行训练,引发了对受版权保护图像可能被未经授权使用的重大担忧。在本文中,我们探讨是否能够确定某张特定图像是否被用于训练集,这一在网络安全社区中被称为成员推断攻击的问题。我们重点关注Stable Diffusion,并解决设计一个公平评估框架以回答这一成员问题的挑战。我们提出了一种建立公平评估设置的方法,并将其应用于Stable Diffusion,同时支持将其潜在扩展到其他生成式模型。利用这一评估设置,我们执行了成员攻击(包括已知和新引入的攻击)。我们的研究表明,先前提出的评估设置未能完全揭示成员推断攻击的有效性。我们得出结论,成员推断攻击对于大规模扩散模型(通常以黑盒系统形式部署)仍然是一个重大挑战,表明相关的隐私和版权问题在可预见的未来将持续存在。