Backdoor attacks are among the most effective, practical, and stealthy attacks in deep learning. In this paper, we consider a practical scenario where a developer obtains a deep model from a third party and uses it as part of a safety-critical system. The developer wants to inspect the model for potential backdoors prior to system deployment. We find that most existing detection techniques make assumptions that are not applicable to this scenario. In this paper, we present a novel framework for detecting backdoors under realistic restrictions. We generate candidate triggers by deductively searching over the space of possible triggers. We construct and optimize a smoothed version of Attack Success Rate as our search objective. Starting from a broad class of template attacks and just using the forward pass of a deep model, we reverse engineer the backdoor attack. We conduct extensive evaluation on a wide range of attacks, models, and datasets, with our technique performing almost perfectly across these settings.
翻译:后门攻击是深度学习中最有效、实用且隐蔽的攻击方式之一。本文考虑一个实际场景:开发者从第三方获取一个深度模型,并将其用于安全关键系统。在系统部署前,开发者希望检测该模型是否存在潜在后门。我们发现现有检测技术大多基于不适用于此场景的假设。本文提出一种在现实限制条件下检测后门的新型框架。我们通过对可能的触发模式空间进行演绎搜索来生成候选触发器,并构建并优化攻击成功率的平滑版本作为搜索目标。从广泛的模板攻击类别出发,仅利用深度模型的前向传播过程,我们实现了对后门攻击的逆向工程。我们在多种攻击方式、模型和数据集上进行了广泛评估,本方法在所有设置下均表现出近乎完美的检测性能。