Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

We propose and analyze an adaptive adversary that can retrain a Trojaned DNN and is also aware of SOTA output-based Trojaned model detectors. We show that such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach is based on an observation that the high dimensionality of the DNN parameters provides sufficient degrees of freedom to simultaneously achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the resulting (optimal) solution of this interactive game leads to the adversary successfully achieving the above objectives. In addition, we provide a greedy algorithm for the adversary to select a minimum number of input samples for embedding triggers. We show that for cross-entropy or log-likelihood loss functions used by the DNNs, the greedy algorithm provides provable guarantees on the needed number of trigger-embedded input samples. Extensive experiments on four diverse datasets -- MNIST, CIFAR-10, CIFAR-100, and SpeechCommand -- reveal that the adversary effectively evades four SOTA output-based Trojaned model detectors: MNTD, NeuralCleanse, STRIP, and TABOR.

翻译：我们提出并分析了一种自适应攻击者，该攻击者能够重新训练植入木马的深度神经网络（DNN），同时知晓当前最先进的基于输出的木马模型检测器。我们证明，此类攻击者可以实现以下目标：（1）在触发嵌入样本和干净样本上均保持高精度；（2）规避检测。该方法基于一个关键观察：DNN参数的高维度特性提供了足够的自由度来同时实现上述目标。我们还通过允许检测器在重新训练过程中校准其参数，使最先进的检测器具备自适应能力，从而模拟木马模型与检测器之间的参数协同演化过程。我们进一步证明，这种协同演化可建模为迭代博弈，且该交互博弈的（最优）解会使攻击者成功达成上述目标。此外，我们为攻击者设计了一种贪心算法，用于选择最少数量的输入样本嵌入触发。研究表明，对于DNN使用的交叉熵或对数似然损失函数，该贪心算法可为所需触发嵌入样本数量提供可证明的保证。在四个不同数据集（MNIST、CIFAR-10、CIFAR-100和SpeechCommand）上的大量实验表明，该攻击者能有效规避四种最先进的基于输出的木马模型检测器：MNTD、NeuralCleanse、STRIP和TABOR。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日