Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. In addition, we introduce a new defense mechanism called "Filtering-then-Contrasting" (FtC) which helps avoid the drop in classification accuracy on clean data caused by "filtering", and combine it with VIF/AIF to derive new defenses of this kind. Extensive experimental results and ablation studies show that our proposed defenses significantly outperform well-known baseline defenses in mitigating five advanced Trojan attacks including two recent state-of-the-art while being quite robust to small amounts of training data and large-norm triggers.
翻译:深度神经网络上的木马攻击既危险又隐蔽。近年来,木马攻击已从仅使用单一输入无关触发器并仅针对单个类别,发展为使用多个输入特定触发器并针对多个类别。然而,木马防御并未跟上这一发展步伐。大多数防御方法仍对木马触发器和目标类别做出不充分假设,因此极易被现代木马攻击绕过。为解决此问题,我们提出了两种新颖的“过滤”防御方法,分别称为变分输入过滤(VIF)和对抗输入过滤(AIF),它们分别利用有损数据压缩和对抗学习,在运行时有效净化输入中的潜在木马触发器,而无需对触发器/目标类别的数量或触发器的输入依赖属性做出假设。此外,我们引入了一种名为“过滤后对比”(FtC)的新防御机制,该机制有助于避免由“过滤”导致的干净数据分类准确率下降,并将其与VIF/AIF结合,衍生出此类新型防御方法。大量实验结果和消融研究表明,我们提出的防御方法在缓解五种高级木马攻击(包括两种最新的最先进攻击)方面显著优于知名基线防御,同时对少量训练数据和大范数触发器具有较强的鲁棒性。