AutoVulnPHP：基于大语言模型的两阶段PHP漏洞检测与自动化定位 (AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization)

PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.

翻译：PHP在Web开发中的主导地位因其安全挑战而受到削弱：静态分析缺乏语义深度，导致高误报率；动态分析计算成本高昂；而自动化漏洞定位则存在粒度粗糙和上下文不精确的问题。此外，大规模PHP漏洞数据集的缺失以及碎片化的工具链阻碍了实际部署。我们提出了AutoVulnPHP，这是一个将两阶段漏洞检测与细粒度自动化定位相耦合的端到端框架。SIFT-VulMiner（基于结构推断的缺陷分类漏洞挖掘器）利用数据流增强的抽象语法树结构生成漏洞假设。SAFE-VulMiner（基于语义分析的缺陷评估漏洞挖掘器）通过预训练代码编码器嵌入来验证候选漏洞，从而消除误报。ISAL（基于增量序列分析的定位器）通过语法引导的追踪、思维链大语言模型推理以及因果一致性检查来精确定位根本原因，确保精确性。我们贡献了PHPVD，这是首个大规模PHP漏洞数据集，包含26,614个文件（520万行代码），涵盖七种漏洞类型。在公开基准测试和PHPVD上，AutoVulnPHP实现了99.7%的检测准确率、99.5%的F1分数以及81.0%的定位率。在实际代码库中部署后，它发现了429个先前未知的漏洞，其中351个被分配了CVE标识符，验证了其实际有效性。

相关内容

PHP

关注 296

PHP 是英文超级文本预处理语言（PHP：Hypertext Preprocessor）的缩写。PHP 是一种 HTML 内嵌式的语言，是一种在服务器端执行的嵌入 HTML 文档的脚本语言，语言的风格有类似于 C 语言，被广泛的运用。PHP 具有非常强大的功能，所有的 CGI 的功能 PHP 都能实现，而且支持几乎所有流行的数据库以及操作系统。

【CVPR2025】CarPlanner: 一种用于自动驾驶大规模强化学习的一致性自回归轨迹规划

专知会员服务

14+阅读 · 2025年3月2日

【NeurIPS2022】SparCL:边缘稀疏持续学习

专知会员服务

24+阅读 · 2022年9月22日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日