This paper introduces a novel approach to membership inference attacks (MIA) targeting stable diffusion computer vision models, specifically focusing on the highly sophisticated Stable Diffusion V2 by StabilityAI. MIAs aim to extract sensitive information about a model's training data, posing significant privacy concerns. Despite its advancements in image synthesis, our research reveals privacy vulnerabilities in the stable diffusion models' outputs. Exploiting this information, we devise a black-box MIA that only needs to query the victim model repeatedly. Our methodology involves observing the output of a stable diffusion model at different generative epochs and training a classification model to distinguish when a series of intermediates originated from a training sample or not. We propose numerous ways to measure the membership features and discuss what works best. The attack's efficacy is assessed using the ROC AUC method, demonstrating a 60\% success rate in inferring membership information. This paper contributes to the growing body of research on privacy and security in machine learning, highlighting the need for robust defenses against MIAs. Our findings prompt a reevaluation of the privacy implications of stable diffusion models, urging practitioners and developers to implement enhanced security measures to safeguard against such attacks.
翻译:本文提出了一种针对稳定扩散计算机视觉模型的成员推理攻击新方法,特别聚焦于StabilityAI开发的高度复杂的Stable Diffusion V2模型。成员推理攻击旨在提取模型训练数据中的敏感信息,引发重大隐私担忧。尽管该模型在图像合成领域取得显著进展,但我们的研究揭示了其输出中存在的隐私漏洞。利用这一信息,我们设计了一种黑盒成员推理攻击方法,仅需重复查询目标模型即可实现攻击。我们的方法通过观察稳定扩散模型在不同生成阶段的输出,训练分类模型区分中间结果序列是否源自训练样本。我们提出了多种成员特征度量方案,并讨论了最优方案。攻击效能采用ROC AUC方法评估,在推断成员信息方面达到60%的成功率。本文为机器学习隐私安全领域的研究做出贡献,凸显了针对成员推理攻击建立强效防御机制的必要性。我们的研究结果促使学界重新审视稳定扩散模型的隐私影响,呼吁开发者和实践者实施更完善的安全措施以抵御此类攻击。