Launching a Robust Backdoor Attack under Capability Constrained Scenarios

As deep neural networks continue to be used in critical domains, concerns over their security have emerged. Deep learning models are vulnerable to backdoor attacks due to the lack of transparency. A poisoned backdoor model may perform normally in routine environments, but exhibit malicious behavior when the input contains a trigger. Current research on backdoor attacks focuses on improving the stealthiness of triggers, and most approaches require strong attacker capabilities, such as knowledge of the model structure or control over the training process. These attacks are impractical since in most cases the attacker's capabilities are limited. Additionally, the issue of model robustness has not received adequate attention. For instance, model distillation is commonly used to streamline model size as the number of parameters grows exponentially, and most of previous backdoor attacks failed after model distillation; the image augmentation operations can destroy the trigger and thus disable the backdoor. This study explores the implementation of black-box backdoor attacks within capability constraints. An attacker can carry out such attacks by acting as either an image annotator or an image provider, without involvement in the training process or knowledge of the target model's structure. Through the design of a backdoor trigger, our attack remains effective after model distillation and image augmentation, making it more threatening and practical. Our experimental results demonstrate that our method achieves a high attack success rate in black-box scenarios and evades state-of-the-art backdoor defenses.

翻译：随着深度神经网络在关键领域的持续应用，其安全性问题日益凸显。由于缺乏透明性，深度学习模型易受后门攻击。被植入后门的模型在常规环境下可能表现正常，但当输入包含触发器时则会展现恶意行为。当前后门攻击研究主要聚焦于提升触发器的隐蔽性，且多数方法要求攻击者具备强能力，例如掌握模型结构信息或控制训练过程。由于现实场景中攻击者能力通常受限，此类攻击缺乏实践性。此外，模型鲁棒性问题尚未得到充分关注。例如，随着模型参数呈指数级增长，模型蒸馏技术被广泛用于精简模型规模，而现有后门攻击在蒸馏后大多失效；图像增强操作可能破坏触发器，致使后门失效。本研究探索了能力约束场景下黑盒后门攻击的实现方法。攻击者可通过充当图像标注者或图像提供者实施攻击，无需参与训练过程或知晓目标模型结构。通过后门触发器的设计，我们的攻击在模型蒸馏和图像增强后仍能生效，使其更具威胁性和实用性。实验结果表明，我们的方法在黑盒场景下实现了高攻击成功率，并能规避当前最先进的后门防御机制。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日