深入探究深度学习API的滥用以创建恶意AI模型及其检测方法 (Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them)

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.

翻译：根据Gartner预测，到2025年底将有超过70%的组织将AI模型集成至其工作流程中。为降低成本并促进创新，从业者通常从Hugging Face或TensorFlow Hub等模型中心获取预训练模型。然而，这引入了新的安全风险：攻击者可向上传至这些中心的模型注入恶意代码，当模型被加载或执行（预测函数）时，可能引发远程代码执行（RCE）、敏感数据窃取、系统文件篡改等多种攻击。鉴于AI模型在数字化转型中的关键作用，此类风险将大幅增加软件供应链攻击的数量。当前已有若干研究致力于检测基于pickle序列化的模型存储方式中隐藏的恶意代码（将恶意软件嵌入模型参数），但针对深度学习API（如TensorFlow API）滥用的风险尚未得到充分研究。具体而言，我们展示了如何滥用TensorFlow API的隐藏功能（如文件读写、网络收发）及其持久化API发起攻击。值得注意的是，Hugging Face和TensorFlow Hub等模型中心的现有扫描器无法检测此类API的隐蔽滥用行为，这是因为扫描工具仅基于语法层面识别可疑功能集合进行分析，而缺乏对功能语义层面的理解。在论证潜在攻击方式后，我们进一步提出如何利用大语言模型识别可能被滥用的隐藏API功能，并构建扫描器以检测此类滥用行为。