Software vulnerabilities continue to grow in volume and remain difficult to detect in practice. Although learning-based vulnerability detection has progressed, existing benchmarks are largely function-centric and fail to capture realistic, executable, interprocedural settings. Recent repo-level security benchmarks demonstrate the importance of realistic environments, but their manual curation limits scale. This doctoral research proposes an automated benchmark generator that injects realistic vulnerabilities into real-world repositories and synthesizes reproducible proof-of-vulnerability (PoV) exploits, enabling precisely labeled datasets for training and evaluating repo-level vulnerability detection agents. We further investigate an adversarial co-evolution loop between injection and detection agents to improve robustness under realistic constraints.
翻译:软件漏洞的数量持续增长,且在实际环境中难以被有效检测。尽管基于学习的漏洞检测技术已取得进展,现有基准数据集大多以函数为中心,未能捕捉真实、可执行、跨过程的场景。近期仓库级安全基准数据展示了现实环境的重要性,但其人工构建方式限制了规模扩展。本博士研究提出一种自动化基准生成器,可将真实漏洞注入现实仓库,并合成可重现的漏洞利用证明(PoV)攻击代码,从而生成精准标注的数据集,用于训练和评估仓库级漏洞检测代理。我们进一步研究注入代理与检测代理之间的对抗性共进化循环,以提升在现实约束条件下的鲁棒性。