SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.

翻译：随着Hugging Face等模型库日益广泛地分发使用Python pickle格式序列化的机器学习制品，用户在模型加载过程中面临远程代码执行（RCE）风险。现有防御方案（如PickleBall）依赖基于特定库的策略合成方法，需要复杂的系统配置和经过验证的良性模型，这限制了其可扩展性与泛化能力。本研究提出一种轻量级、基于机器学习的扫描器，无需策略生成或代码插装即可检测基于Pickle的恶意文件。该方法从Pickle字节码中静态提取结构与语义特征，并应用监督与非监督模型将文件分类为良性或恶意。我们构建并发布了包含727个Hugging Face平台Pickle格式文件的标注数据集，并在四个数据集上评估模型性能：自建数据集、PickleBall数据集（分布外数据）、Hide-and-Seek数据集（9个高级规避型恶意模型）以及合成的joblib文件。实验表明，在我们的数据集上，本方法取得90.01%的F1分数，显著优于当前最优扫描器（Modelscan、Fickling、ClamAV、VirusTotal）的7.23%-62.75%性能区间。在PickleBall分布外数据上，本方法取得81.22%的F1分数，优于PickleBall原方法的76.09%，且完全保持库无关性。最后，我们证明本方法是唯一能正确解析并分类全部9个专门设计用于规避扫描器的Hide-and-Seek恶意模型的方法。这些结果表明，数据驱动的检测技术能够有效且通用地防御基于Pickle的模型文件攻击。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《深度伪造检测模型的准确性和鲁棒性》2023最新论文

专知会员服务

41+阅读 · 2023年10月29日

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

专知会员服务

36+阅读 · 2023年7月22日

【斯坦福博士论文】大模型驱动的鲁棒机器学习，243页pdf

专知会员服务

59+阅读 · 2023年7月10日

【MIT博士论文】机器学习模型鲁棒性的探索、改进与验证，208页pdf

专知会员服务

47+阅读 · 2023年4月2日