Beyond Function-Level Analysis: Context-Aware Reasoning for Inter-Procedural Vulnerability Detection

Recent progress in ML and LLMs has improved vulnerability detection, and recent datasets have reduced label noise and unrelated code changes. However, most existing approaches still operate at the function level, where models are asked to predict whether a single function is vulnerable without inter-procedural context. In practice, vulnerability presence and root cause often depend on contextual information. Naively appending such context is not a reliable solution: real-world context is long, redundant, and noisy, and we find that unstructured context frequently degrades the performance of strong fine-tuned code models. We present CPRVul, a context-aware vulnerability detection framework that couples Context Profiling and Selection with Structured Reasoning. CPRVul constructs a code property graph, and extracts candidate context. It then uses an LLM to generate security-focused profiles and assign relevance scores, selecting only high-impact contextual elements that fit within the model's context window. In the second phase, CPRVul integrates the target function, the selected context, and auxiliary vulnerability metadata to generate reasoning traces, which are used to fine-tune LLMs for reasoning-based vulnerability detection. We evaluate CPRVul on three high-quality vulnerability datasets: PrimeVul, TitanVul, and CleanVul. Across all datasets, CPRVul consistently outperforms function-only baselines, achieving accuracies ranging from 64.94% to 73.76%, compared to 56.65% to 63.68% for UniXcoder. Specifically, on the challenging PrimeVul benchmark, CPRVul achieves 67.78% accuracy, outperforming prior state-of-the-art approaches, improving accuracy from 55.17% to 67.78% (22.9% improvement). Our ablations further show that neither raw context nor processed context alone benefits strong code models; gains emerge only when processed context is paired with structured reasoning.

翻译：机器学习和大型语言模型的最新进展提升了漏洞检测能力，近期发布的数据集也减少了标签噪声与无关代码变更的影响。然而，现有方法大多仍局限于函数级别，即要求模型在缺乏过程间上下文的情况下判断单个函数是否存在漏洞。实践中，漏洞的存在及其根本原因往往依赖于上下文信息。简单地附加此类上下文并非可靠解决方案：真实世界的上下文通常冗长、冗余且包含噪声，我们发现非结构化的上下文往往会降低经过精调的强代码模型的性能。本文提出CPRVul——一个融合上下文分析与选择与结构化推理的上下文感知漏洞检测框架。CPRVul首先构建代码属性图并提取候选上下文，随后利用大型语言模型生成安全导向的上下文特征描述并分配相关性分数，仅选择符合模型上下文窗口限制的高影响力上下文元素。在第二阶段，CPRVul整合目标函数、选定上下文及辅助漏洞元数据生成推理轨迹，并基于此对大型语言模型进行精调，实现基于推理的漏洞检测。我们在三个高质量漏洞数据集（PrimeVul、TitanVul和CleanVul）上评估CPRVul。在所有数据集中，CPRVul均稳定超越仅使用函数信息的基线模型，准确率范围达64.94%至73.76%，而UniXcoder的准确率范围为56.65%至63.68%。特别是在具有挑战性的PrimeVul基准测试中，CPRVul取得67.78%的准确率，优于现有最优方法，将准确率从55.17%提升至67.78%（相对提升22.9%）。消融实验进一步表明，原始上下文或经处理的上下文单独使用均无法使强代码模型受益；仅当经处理的上下文与结构化推理结合时才能实现性能提升。