Recently, Carlini et al. demonstrated the widely used model Stable Diffusion can regurgitate real training samples, which is troublesome from a copyright perspective. In this work, we provide an efficient extraction attack on par with the recent attack, with several order of magnitudes less network evaluations. In the process, we expose a new phenomena, which we dub template verbatims, wherein a diffusion model will regurgitate a training sample largely in tact. Template verbatims are harder to detect as they require retrieval and masking to correctly label. Furthermore, they are still generated by newer systems, even those which de-duplicate their training set, and we give insight into why they still appear during generation. We extract training images from several state of the art systems, including Stable Diffusion 2.0, Deep Image Floyd, and finally Midjourney v4. We release code to verify our extraction attack, perform the attack, as well as all extracted prompts at \url{https://github.com/ryanwebster90/onestep-extraction}.
翻译:近期,Carlini等人展示了广泛使用的模型Stable Diffusion能够重现真实训练样本,这从版权角度而言令人困扰。在本工作中,我们提出了一种与近期攻击方法性能相当的高效提取攻击,但网络评估次数减少了数个数量级。在此过程中,我们揭示了一个新现象——命名为模板逐字重现(template verbatims),即扩散模型会几乎完整地复现训练样本。模板逐字重现更难检测,因其需要检索和掩码处理才能正确标记。此外,即便是针对训练集执行去重操作的更新系统,此类现象仍会出现,我们对此类生成现象的成因提供了见解。我们从多个先进系统中提取了训练图像,包括Stable Diffusion 2.0、Deep Image Floyd及Midjourney v4。我们公开发布代码(\url{https://github.com/ryanwebster90/onestep-extraction})以验证提取攻击、实施攻击及所有提取的提示词。