Recently, text-to-image generative models have been misused to create unauthorized malicious images of individuals, posing a growing social problem. Previous solutions, such as Anti-DreamBooth, add adversarial noise to images to protect them from being used as training data for malicious generation. However, we found that the adversarial noise can be removed by adversarial purification methods such as DiffPure. Therefore, we propose a new adversarial attack method that adds strong perturbation on the high-frequency areas of images to make it more robust to adversarial purification. Our experiment showed that the adversarial images retained noise even after adversarial purification, hindering malicious image generation.
翻译:近年来,文本到图像生成模型被滥用于生成未经授权的个人恶意图像,这已成为日益严重的社会问题。先前的解决方案,例如Anti-DreamBooth,通过在图像中添加对抗性噪声来保护其免于被用作恶意生成的训练数据。然而,我们发现这种对抗性噪声可以被DiffPure等对抗性净化方法移除。因此,我们提出了一种新的对抗性攻击方法,该方法在图像的高频区域施加强扰动,以增强其对对抗性净化的鲁棒性。我们的实验表明,对抗性图像即使在经过对抗性净化后仍能保留噪声,从而有效阻碍恶意图像的生成。