Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies

The widespread adoption of deep learning across various industries has introduced substantial challenges, particularly in terms of model explainability and security. The inherent complexity of deep learning models, while contributing to their effectiveness, also renders them susceptible to adversarial attacks. Among these, backdoor attacks are especially concerning, as they involve surreptitiously embedding specific triggers within training data, causing the model to exhibit aberrant behavior when presented with input containing the triggers. Such attacks often exploit vulnerabilities in outsourced processes, compromising model integrity without affecting performance on clean (trigger-free) input data. In this paper, we present a comprehensive review of existing mitigation strategies designed to counter backdoor attacks in image recognition. We provide an in-depth analysis of the theoretical foundations, practical efficacy, and limitations of these approaches. In addition, we conduct an extensive benchmarking of sixteen state-of-the-art approaches against eight distinct backdoor attacks, utilizing three datasets, four model architectures, and three poisoning ratios. Our results, derived from 122,236 individual experiments, indicate that while many approaches provide some level of protection, their performance can vary considerably. Furthermore, when compared to two seminal approaches, most newer approaches do not demonstrate substantial improvements in overall performance or consistency across diverse settings. Drawing from these findings, we propose potential directions for developing more effective and generalizable defensive mechanisms in the future.

翻译：深度学习在各行业的广泛应用带来了重大挑战，尤其是在模型可解释性与安全性方面。深度学习模型固有的复杂性在提升其效能的同时，也使其易受对抗攻击的影响。其中，后门攻击尤其值得关注，这类攻击通过在训练数据中隐秘嵌入特定触发器，使模型在接收到包含触发器的输入时表现出异常行为。此类攻击常利用外包流程中的漏洞，在不影响干净（无触发器）输入数据性能的前提下破坏模型完整性。本文系统综述了当前针对图像识别后门攻击的缓解策略，深入分析了这些方法的理论基础、实际效能与局限性。此外，我们采用三个数据集、四种模型架构及三种投毒比例，对十六种前沿方法与八种不同的后门攻击进行了大规模基准测试。基于122,236次独立实验的结果表明，尽管多数方法能提供一定程度的防护，但其性能表现存在显著差异。与两种开创性方法相比，大多数新方法在不同场景下的整体性能与一致性并未展现出实质性提升。基于这些发现，我们为未来开发更有效、更具泛化能力的防御机制提出了潜在研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/