Deep neural networks (DNNs) can be manipulated to exhibit specific behaviors when exposed to specific trigger patterns, without affecting their performance on normal samples. This type of attack is known as a backdoor attack. Recent research has focused on designing invisible triggers for backdoor attacks to ensure visual stealthiness. These triggers have demonstrated strong attack performance even under backdoor defense, which aims to eliminate or suppress the backdoor effect in the model. However, through experimental observations, we have noticed that these carefully designed invisible triggers are often susceptible to visual distortion during inference, such as Gaussian blurring or environmental variations in real-world scenarios. This phenomenon significantly undermines the effectiveness of attacks in practical applications. Unfortunately, this issue has not received sufficient attention and has not been thoroughly investigated. To address this limitation, we propose a novel approach called the Visible, Semantic, Sample-Specific, and Compatible trigger (VSSC-trigger), which leverages a recent powerful image method known as the stable diffusion model. In this approach, a text trigger is utilized as a prompt and combined with a benign image. The resulting combination is then processed by a pre-trained stable diffusion model, generating a corresponding semantic object. This object is seamlessly integrated with the original image, resulting in a new realistic image, referred to as the poisoned image. Extensive experimental results and analysis validate the effectiveness and robustness of our proposed attack method, even in the presence of visual distortion. We believe that the new trigger proposed in this work, along with the proposed idea to address the aforementioned issues, will have significant prospective implications for further advancements in this direction.
翻译:深度神经网络(DNNs)可在暴露于特定触发模式时被操控以展现特定行为,而不会影响其在正常样本上的性能。此类攻击被称为后门攻击。近期研究聚焦于设计不可见触发器以实现后门攻击的视觉隐蔽性。这些触发器即使在后门防御(旨在消除或抑制模型后门效应)下仍展现出强大的攻击性能。然而,通过实验观察,我们注意到这些精心设计的不可见触发器在推理过程中极易受到视觉失真影响(例如高斯模糊或现实场景中的环境变化)。这一现象严重削弱了攻击在实际应用中的有效性。遗憾的是,该问题尚未获得充分关注并得到深入探究。为解决这一局限,我们提出一种名为“可见、语义、样本特定且兼容触发器”(VSSC-trigger)的新方法,其利用近期强大的图像方法——稳定扩散模型。在该方法中,文本触发器被用作提示并与良性图像结合,随后由预训练的稳定扩散模型处理,生成对应的语义对象。该对象与原始图像无缝融合,形成新的逼真图像,即毒化图像。大量实验结果与分析验证了所提攻击方法在视觉失真等场景下的有效性与鲁棒性。我们相信,本工作提出的新型触发器及解决上述问题的思路,将对该方向的进一步发展产生重要前瞻性影响。