Adversarial attacks on Latent Diffusion Model (LDM), the state-of-the-art image generative model, have been adopted as effective protection against malicious finetuning of LDM on unauthorized images. We show that these attacks add an extra error to the score function of adversarial examples predicted by LDM. LDM finetuned on these adversarial examples learns to lower the error by a bias, from which the model is attacked and predicts the score function with biases. Based on the dynamics, we propose to improve the adversarial attack on LDM by Attacking with Consistent score-function Errors (ACE). ACE unifies the pattern of the extra error added to the predicted score function. This induces the finetuned LDM to learn the same pattern as a bias in predicting the score function. We then introduce a well-crafted pattern to improve the attack. Our method outperforms state-of-the-art methods in adversarial attacks on LDM.
翻译:对抗攻击潜在扩散模型(LDM,最先进的图像生成模型)已成为有效防护手段,用于防止未经授权的图像被恶意微调LDM。我们发现这些攻击会在LDM预测的对抗样本得分函数中引入额外误差。针对这些对抗样本进行微调的LDM会通过学习偏差来降低该误差,从而使受攻击模型在预测得分函数时产生偏差。基于这一动力学机制,我们提出通过一致性得分函数误差攻击(ACE)改进LDM的对抗攻击方法。ACE统一了预测得分函数中额外误差的模式,促使微调后的LDM以偏差形式学习预测得分函数中的相同模式。我们进一步设计精细化的模式来增强攻击效果。实验表明,该方法在LDM对抗攻击任务上优于现有最优方法。