Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

Deep neural networks are vulnerable to backdoor attacks, where an adversary maliciously manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which are impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We focus on test-time image purification methods that incapacitate possible triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search in image space can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose a framework of Blind Defense with Masked AutoEncoder (BDMAE). It detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Finally, we fuse MAE restorations adaptively into a purified image for making prediction. Our approach is blind to the model architectures, trigger patterns and image benignity. Extensive experiments under different backdoor settings validate its effectiveness and generalizability. Code is available at https://github.com/tsun/BDMAE.

翻译：深度神经网络易受后门攻击，攻击者通过在图像上叠加特殊触发器恶意操纵模型行为。现有后门防御方法通常需要访问少量验证数据和模型参数，这在许多实际应用场景（如模型以云服务形式提供）中难以实现。本文致力于解决测试时的盲后门防御这一实际任务，特别针对黑箱模型。无论测试图像是否为良性，均需即时从可疑模型中恢复其真实标签。我们聚焦于测试时图像净化方法，在保留语义内容完整性的同时消除潜在触发器。由于触发器模式与尺寸的多样性，在图像空间中执行启发式触发器搜索可能缺乏可扩展性。为突破此限制，我们利用生成模型的强大重构能力，提出基于掩码自编码器的盲防御框架（BDMAE）。该方法通过测试图像与MAE恢复结果之间的结构相似性和标签一致性，在词元空间中检测潜在触发器，随后结合触发器拓扑结构优化检测结果。最终将MAE恢复结果自适应融合为净化图像以执行预测。本方法对模型架构、触发器模式及图像良性程度保持盲态。在不同后门设置下的广泛实验验证了其有效性与泛化能力。代码已开源至https://github.com/tsun/BDMAE。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日