Trojan Model Detection Using Activation Optimization

Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations (unavailability or being too large), or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible. However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model's behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS achieves state-of-the-art performance on two public datasets of convolutional models. Additionally, we introduce a new challenging dataset of ImageNet models based on the vision transformer architecture. TRIGS delivers the best performance on the new dataset, surpassing the baseline methods by a large margin. Our experiments also show that TRIGS requires only a small amount of clean samples to achieve good performance, and works reasonably well even if the defender does not have prior knowledge about the attacker's model architecture. Our dataset will be released soon.

翻译：训练机器学习模型可能非常昂贵，甚至难以负担。例如，这可能由于数据限制（不可用或过于庞大）或计算能力限制所致。因此，尽可能依赖开源预训练模型是一种常见做法。然而，这种做法从安全角度来看令人担忧。预训练模型可能受到木马攻击，攻击者在模型中嵌入触发器，使得当输入中存在该触发器时，攻击者能够控制模型的行为。本文提出了一种检测木马模型的新方法。我们的方法基于激活优化为模型生成签名，然后训练一个分类器，根据其签名检测木马模型。我们将该方法称为TRIGS（基于梯度签名的木马识别）。TRIGS在卷积模型的两个公开数据集上达到了最先进的性能。此外，我们基于视觉Transformer架构引入了一个具有挑战性的新ImageNet模型数据集。TRIGS在新数据集上取得了最佳性能，以较大优势超越了基线方法。我们的实验还表明，TRIGS仅需少量干净样本即可获得良好性能，并且即使防御者事先不了解攻击者的模型架构，也能合理工作。我们的数据集将很快发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日