Model X-Ray: Detection of Hidden Malware in AI Model Weights using Few Shot Learning

The potential for exploitation of AI models has increased due to the rapid advancement of Artificial Intelligence (AI) and the widespread use of platforms like Model Zoo for sharing AI models. Attackers can embed malware within AI models through steganographic techniques, taking advantage of the substantial size of these models to conceal malicious data and use it for nefarious purposes, e.g. Remote Code Execution. Ensuring the security of AI models is a burgeoning area of research essential for safeguarding the multitude of organizations and users relying on AI technologies. This study leverages well-studied image few-shot learning techniques by transferring the AI models to the image field using a novel image representation. Applying few-shot learning in this field enables us to create practical models, a feat that previous works lack. Our method addresses critical limitations in state-of-the-art detection techniques that hinder their practicality. This approach reduces the required training dataset size from 40000 models to just 6. Furthermore, our methods consistently detect delicate attacks of up to 25% embedding rate and even up to 6% in some cases, while previous works were only shown to be effective for a 100%-50% embedding rate. We employ a strict evaluation strategy to ensure the trained models are generic concerning various factors. In addition, we show that our trained models successfully detect novel spread-spectrum steganography attacks, demonstrating the models' impressive robustness just by learning one type of attack. We open-source our code to support reproducibility and enhance the research in this new field.

翻译：随着人工智能（AI）技术的快速发展和Model Zoo等AI模型共享平台的广泛使用，AI模型被恶意利用的风险日益增加。攻击者可通过隐写技术将恶意软件嵌入AI模型中，利用模型庞大的参数量隐藏恶意数据并用于非法目的（例如远程代码执行）。确保AI模型安全已成为一个新兴的关键研究领域，对保护依赖AI技术的众多组织与用户至关重要。本研究通过一种新颖的图像表示方法将AI模型转换至图像领域，从而利用成熟的图像少样本学习技术。在该领域应用少样本学习使我们能够构建实用化的检测模型，这是以往研究未能实现的突破。我们的方法解决了现有最先进检测技术中影响实用性的关键局限：将所需训练数据集规模从40000个模型减少至仅6个。同时，我们的方法能稳定检测嵌入率高达25%的精细攻击（部分情况下甚至可检测6%嵌入率），而先前研究仅对100%-50%嵌入率的攻击有效。我们采用严格的评估策略确保训练模型在多种因素下具有泛化能力。此外，实验表明训练模型仅通过学习单一攻击类型即可成功检测新型扩频隐写攻击，展现出卓越的鲁棒性。我们已开源代码以支持结果复现，推动这一新兴领域的研究发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日