BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).

翻译：我们提出一种针对深度神经网络后门攻击的新型防御方法，其中攻击者会隐秘地将恶意行为（后门）植入深度神经网络中。该防御属于后开发防御范畴，其运作方式独立于模型的生成过程。所提出的防御建立于一种新颖的逆向工程技术之上，该技术可直接将给定带后门模型的后门功能提取至一个后门专家模型。该方法简单直接——通过少量故意错误标注的干净样本对带后门模型进行微调，使其遗忘正常功能而保留后门功能，从而生成一个仅能识别后门输入的模型（称为后门专家模型）。基于提取的后门专家模型，我们展示了构建高精度后门输入检测器的可行性，该检测器可在模型推理过程中过滤掉后门输入。进一步通过集成策略与微调的辅助模型相结合，我们的防御方法BaDExpert（基于后门专家的后门输入检测）能有效缓解16种最先进的后门攻击，同时对干净样本的效用影响极小。BaDExpert的有效性已在多个数据集（CIFAR10、GTSRB和ImageNet）以及多种模型架构（ResNet、VGG、MobileNetV2和Vision Transformer）上得到验证。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日