Recently, text-to-image generative models have been misused to create unauthorized malicious images of individuals, posing a growing social problem. Previous solutions, such as Anti-DreamBooth, add adversarial noise to images to protect them from being used as training data for malicious generation. However, we found that the adversarial noise can be removed by adversarial purification methods such as DiffPure. Therefore, we propose a new adversarial attack method that adds strong perturbation on the high-frequency areas of images to make it more robust to adversarial purification. Our experiment showed that the adversarial images retained noise even after adversarial purification, hindering malicious image generation.
翻译:近来,文本到图像生成模型被滥用于创建个人的未经授权恶意图像,构成了日益严重的社会问题。先前的解决方案,例如Anti-DreamBooth,通过在图像上添加对抗性噪声来保护其免于被用作恶意生成的训练数据。然而,我们发现这种对抗性噪声可以被诸如DiffPure之类的对抗性净化方法所移除。因此,我们提出了一种新的对抗性攻击方法,该方法在图像的高频区域添加强扰动,使其对对抗性净化更具鲁棒性。我们的实验表明,即使经过对抗性净化,对抗性图像仍能保留噪声,从而阻碍恶意图像的生成。