Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and focus on coarse function- or file-level detections that offer limited guidance to engineers who need precise line-level localization for targeted patches. We introduce T2L, an executable framework for project-level, line-level vulnerability localization that progressively narrows scope from repository modules to exact vulnerable lines via AST-based chunking and evidence-guided refinement. We provide a baseline agent with an Agentic Trace Analyzer (ATA) that fuses runtime evidence such as crash points and stack traces to translate failure symptoms into actionable diagnoses. To enable rigorous evaluation, we introduce T2L-ARVO, an expert-verified 50-case benchmark spanning five crash families in real-world projects. On T2L-ARVO, our baseline achieves up to 58.0% detection and 54.8% line-level localization rate. Together, T2L framework advance LLM-based vulnerability detection toward deployable, precision diagnostics in open-source software workflows.
翻译:大语言模型在漏洞发现方面展现出潜力,但现有方法孤立地检查代码,难以处理长上下文,且主要关注粗粒度的函数或文件级检测,这为需要精确定位到行级以进行针对性补丁的工程师提供的指导有限。我们提出了T2L,一个可执行的、项目级、行级漏洞定位框架,它通过基于抽象语法树的分块和证据引导的细化,逐步将范围从仓库模块缩小到确切的漏洞行。我们提供了一个基线智能体,其配备的智能体踪迹分析器融合了崩溃点和堆栈跟踪等运行时证据,从而将故障症状转化为可操作的诊断。为了支持严格评估,我们引入了T2L-ARVO,这是一个经过专家验证的、包含50个案例的基准测试集,涵盖了真实世界项目中的五类崩溃家族。在T2L-ARVO上,我们的基线模型实现了高达58.0%的检测率和54.8%的行级定位率。T2L框架共同推动基于大语言模型的漏洞检测朝着可部署、精准诊断的开源软件工作流程迈进。