Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.
翻译:随着Hugging Face等模型库日益广泛地分发使用Python pickle格式序列化的机器学习制品,用户在模型加载过程中面临远程代码执行(RCE)风险。现有防御方案(如PickleBall)依赖基于特定库的策略合成方法,需要复杂的系统配置和经过验证的良性模型,这限制了其可扩展性与泛化能力。本研究提出一种轻量级、基于机器学习的扫描器,无需策略生成或代码插装即可检测基于Pickle的恶意文件。该方法从Pickle字节码中静态提取结构与语义特征,并应用监督与非监督模型将文件分类为良性或恶意。我们构建并发布了包含727个Hugging Face平台Pickle格式文件的标注数据集,并在四个数据集上评估模型性能:自建数据集、PickleBall数据集(分布外数据)、Hide-and-Seek数据集(9个高级规避型恶意模型)以及合成的joblib文件。实验表明,在我们的数据集上,本方法取得90.01%的F1分数,显著优于当前最优扫描器(Modelscan、Fickling、ClamAV、VirusTotal)的7.23%-62.75%性能区间。在PickleBall分布外数据上,本方法取得81.22%的F1分数,优于PickleBall原方法的76.09%,且完全保持库无关性。最后,我们证明本方法是唯一能正确解析并分类全部9个专门设计用于规避扫描器的Hide-and-Seek恶意模型的方法。这些结果表明,数据驱动的检测技术能够有效且通用地防御基于Pickle的模型文件攻击。