Black-box Backdoor Defense via Zero-shot Image Purification

Backdoor attacks inject poisoned data into the training set, resulting in misclassification of the poisoned samples during model inference. Defending against such attacks is challenging, especially in real-world black-box settings where only model predictions are available. In this paper, we propose a novel backdoor defense framework that can effectively defend against various attacks through zero-shot image purification (ZIP). Our proposed framework can be applied to black-box models without requiring any internal information about the poisoned model or any prior knowledge of the clean/poisoned samples. Our defense framework involves a two-step process. First, we apply a linear transformation on the poisoned image to destroy the trigger pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process using the transformed image to guide the generation of high-fidelity purified images, which can be applied in zero-shot settings. We evaluate our ZIP backdoor defense framework on multiple datasets with different kinds of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models.

翻译：后门攻击通过向训练集中注入恶意样本，导致模型推理时将带毒样本错误分类。防御此类攻击极具挑战性，尤其在仅有模型预测结果可用的真实黑盒场景中。本文提出了一种新颖的后门防御框架，可通过零样本图像净化（ZIP）有效抵御多种攻击。该框架无需被毒化模型的任何内部信息，也无需对干净/带毒样本具有先验知识，即可直接应用于黑盒模型。我们的防御框架包含两步流程：首先对被毒化图像施加线性变换以破坏触发器图案，随后利用预训练扩散模型恢复变换过程中丢失的语义信息。特别地，我们设计了一种基于变换图像引导的新逆向过程，可在零样本条件下生成高保真净化图像。我们在包含多种攻击类型的数据集上评估了ZIP后门防御框架，实验结果表明该框架相较于现有最优后门防御基线具有显著优势。我们相信该成果将为未来黑盒模型防御方法提供重要启示。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

专知会员服务

4+阅读 · 2022年7月9日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【AAAI2022】自适应的随机平滑防御的鲁棒性认证方法

专知会员服务

26+阅读 · 2021年12月27日