As deep neural networks continue to be used in critical domains, concerns over their security have emerged. Deep learning models are vulnerable to backdoor attacks due to the lack of transparency. A poisoned backdoor model may perform normally in routine environments, but exhibit malicious behavior when the input contains a trigger. Current research on backdoor attacks focuses on improving the stealthiness of triggers, and most approaches require strong attacker capabilities, such as knowledge of the model structure or control over the training process. These attacks are impractical since in most cases the attacker's capabilities are limited. Additionally, the issue of model robustness has not received adequate attention. For instance, model distillation is commonly used to streamline model size as the number of parameters grows exponentially, and most of previous backdoor attacks failed after model distillation; the image augmentation operations can destroy the trigger and thus disable the backdoor. This study explores the implementation of black-box backdoor attacks within capability constraints. An attacker can carry out such attacks by acting as either an image annotator or an image provider, without involvement in the training process or knowledge of the target model's structure. Through the design of a backdoor trigger, our attack remains effective after model distillation and image augmentation, making it more threatening and practical. Our experimental results demonstrate that our method achieves a high attack success rate in black-box scenarios and evades state-of-the-art backdoor defenses.
翻译:随着深度神经网络在关键领域的持续应用,其安全性问题日益凸显。由于缺乏透明性,深度学习模型易受后门攻击。被植入后门的模型在常规环境下可能表现正常,但当输入包含触发器时则会展现恶意行为。当前后门攻击研究主要聚焦于提升触发器的隐蔽性,且多数方法要求攻击者具备强能力,例如掌握模型结构信息或控制训练过程。由于现实场景中攻击者能力通常受限,此类攻击缺乏实践性。此外,模型鲁棒性问题尚未得到充分关注。例如,随着模型参数呈指数级增长,模型蒸馏技术被广泛用于精简模型规模,而现有后门攻击在蒸馏后大多失效;图像增强操作可能破坏触发器,致使后门失效。本研究探索了能力约束场景下黑盒后门攻击的实现方法。攻击者可通过充当图像标注者或图像提供者实施攻击,无需参与训练过程或知晓目标模型结构。通过后门触发器的设计,我们的攻击在模型蒸馏和图像增强后仍能生效,使其更具威胁性和实用性。实验结果表明,我们的方法在黑盒场景下实现了高攻击成功率,并能规避当前最先进的后门防御机制。