EvoRepair: Enhancing Vulnerability Repair Agents Through Experience-Based Self-Evolution

Large Language Models (LLMs) have shown promise for automated vulnerability repair (AVR), but they still face several limitations, including the lack of intra-vulnerability experience accumulation and the lack of cross-vulnerability experience reuse. As a result, LLMs may repeatedly make similar mistakes during iterative repair and underutilize valuable repair knowledge from historical vulnerabilities. To address these challenges, we propose EvoRepair, the first experience-based self-evolving AVR agent framework that enables LLMs to accumulate, refine, and leverage domain-specific knowledge across long-horizon vulnerability repairs. EvoRepair follows a cyclic learn-and-repair process that retrieves relevant past experiences to guide repair, extracts new experiences from repair trajectories, and updates an experience bank using quality-aware scoring. We evaluate EvoRepair against 12 representative vulnerability repair baselines on PATCHEVAL and SEC-bench using GPT-5-mini. Results show that EvoRepair achieves the best overall performance, reaching 93.47% on PATCHEVAL, 87.00% on SEC-bench, and 90.46% overall. In particular, EvoRepair outperforms latest LLM-based baseline LoopRepair by 39.56% and 33.50% on PATCHEVAL and SEC-bench, respectively, and surpasses IntentFix by 70.86% and 50.50%. Across both benchmarks, EvoRepair also exceeds the recent self-evolving agent Live-SWE-Agent by 6.98% overall. Additional transfer experiments on VUL4J further demonstrate the robustness of EvoRepair across models, programming languages, and datasets. These findings demonstrate that experience-based self-evolution substantially strengthens agentic AVR and goes beyond existing self-evolving techniques.

翻译：大语言模型在自动化漏洞修复领域展现出潜力，但仍面临若干局限，包括缺乏漏洞内部经验积累以及缺乏跨漏洞经验复用能力。这导致大语言模型在迭代修复过程中可能重复类似错误，且未能充分利用历史漏洞中的宝贵修复知识。为应对这些挑战，我们提出EvoRepair——首个基于经验自我演进的自动化漏洞修复智能体框架，使大语言模型能够跨长期漏洞修复任务积累、精炼并利用领域特定知识。EvoRepair遵循循环学习-修复流程：检索相关历史经验指导修复，从修复轨迹中提取新经验，并通过质量感知评分更新经验库。我们基于GPT-5-mini在PATCheVAL和SEC-bench基准上，将EvoRepair与12个代表性漏洞修复基线方法进行对比评估。结果表明，EvoRepair实现了最优综合性能，在PATCheVAL上达到93.47%，在SEC-bench上达到87.00%，综合得分90.46%。特别地，EvoRepair在PATCheVAL和SEC-bench上分别超越最新基于大语言模型的基线方法LoopRepair 39.56%和33.50%，并超过IntentFix 70.86%和50.50%。在两个基准上，EvoRepair综合性能也超越近期自我演进智能体Live-SWE-Agent 6.98%。在VUL4J上的额外迁移实验进一步证明了EvoRepair在模型、编程语言和数据集层面的鲁棒性。这些发现表明，基于经验的自我演进显著增强了自动化漏洞修复能力，且超越了现有自我演进技术。