PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.
翻译:PHP在Web开发中的主导地位因其安全挑战而受到削弱:静态分析缺乏语义深度,导致高误报率;动态分析计算成本高昂;而自动化漏洞定位则存在粒度粗糙和上下文不精确的问题。此外,大规模PHP漏洞数据集的缺失以及碎片化的工具链阻碍了实际部署。我们提出了AutoVulnPHP,这是一个将两阶段漏洞检测与细粒度自动化定位相耦合的端到端框架。SIFT-VulMiner(基于结构推断的缺陷分类漏洞挖掘器)利用数据流增强的抽象语法树结构生成漏洞假设。SAFE-VulMiner(基于语义分析的缺陷评估漏洞挖掘器)通过预训练代码编码器嵌入来验证候选漏洞,从而消除误报。ISAL(基于增量序列分析的定位器)通过语法引导的追踪、思维链大语言模型推理以及因果一致性检查来精确定位根本原因,确保精确性。我们贡献了PHPVD,这是首个大规模PHP漏洞数据集,包含26,614个文件(520万行代码),涵盖七种漏洞类型。在公开基准测试和PHPVD上,AutoVulnPHP实现了99.7%的检测准确率、99.5%的F1分数以及81.0%的定位率。在实际代码库中部署后,它发现了429个先前未知的漏洞,其中351个被分配了CVE标识符,验证了其实际有效性。