SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.

翻译：深度神经网络（DNNs）易受后门攻击，攻击者在训练过程中嵌入隐藏的后门触发器以实现恶意预测操纵。这些攻击对现实世界中机器学习即服务（MLaaS）场景下的DNN应用构成巨大威胁，在该场景下部署的模型完全为黑盒，用户仅能查询并获取其预测结果。目前，存在许多防御措施以降低后门威胁，但几乎所有方法均无法应用于MLaaS场景，因其需要访问甚至修改可疑模型。本文提出一种简单而有效的黑盒输入级后门检测方法SCALE-UP，该方法仅需预测标签即可缓解上述问题。具体而言，我们通过分析像素级放大过程中的预测一致性来识别并过滤恶意测试样本。我们的防御基于一个有趣的现象（称为缩放预测一致性）：当放大所有像素值时，带毒样本的预测结果显著比良性样本更一致。此外，我们还为该现象提供了理论基础。在基准数据集上进行的大量实验验证了我们防御的有效性与效率，及其对潜在自适应攻击的抵抗能力。我们的代码可在https://github.com/JunfengGo/SCALE-UP获取。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日