Backdoor Defense via Deconfounded Representation Learning

Deep neural networks (DNNs) are recently shown to be vulnerable to backdoor attacks, where attackers embed hidden backdoors in the DNN model by injecting a few poisoned examples into the training dataset. While extensive efforts have been made to detect and remove backdoors from backdoored DNNs, it is still not clear whether a backdoor-free clean model can be directly obtained from poisoned datasets. In this paper, we first construct a causal graph to model the generation process of poisoned data and find that the backdoor attack acts as the confounder, which brings spurious associations between the input images and target labels, making the model predictions less reliable. Inspired by the causal understanding, we propose the Causality-inspired Backdoor Defense (CBD), to learn deconfounded representations for reliable classification. Specifically, a backdoored model is intentionally trained to capture the confounding effects. The other clean model dedicates to capturing the desired causal effects by minimizing the mutual information with the confounding representations from the backdoored model and employing a sample-wise re-weighting scheme. Extensive experiments on multiple benchmark datasets against 6 state-of-the-art attacks verify that our proposed defense method is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples. Further analysis shows that CBD can also resist potential adaptive attacks. The code is available at \url{https://github.com/zaixizhang/CBD}.

翻译：深度神经网络（DNN）近期被证明易受后门攻击，攻击者通过在训练数据集中注入少量中毒样本，将隐藏后门嵌入DNN模型。尽管已有大量研究致力于检测和移除后门DNN中的后门，但能否从中毒数据集中直接获得无后门的干净模型仍不清楚。本文首先构建因果图对中毒数据的生成过程进行建模，发现后门攻击充当了混杂因子，导致输入图像与目标标签之间产生虚假关联，从而降低模型预测的可靠性。受此因果理解启发，我们提出基于因果启发的后门防御（CBD），通过学习反混杂表示实现可靠分类。具体而言，我们有意训练一个带后门模型以捕获混杂效应，而另一个干净模型则通过最小化与带后门模型中混杂表示的互信息，并采用样本级重加权方案，致力于捕获所需的因果效应。在多个基准数据集上针对6种最先进攻击的大量实验表明，本文提出的防御方法能有效降低后门威胁，同时保持对良性样本的高精度预测。进一步分析显示，CBD还能抵御潜在的自适应攻击。代码开源地址：\url{https://github.com/zaixizhang/CBD}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/