Specification Vibing for Automated Program Repair

Large language model (LLM)-driven automated program repair (APR) has advanced rapidly, but most methods remain code-centric: they directly rewrite source code and thereby risk hallucinated, behaviorally inconsistent fixes. This limitation suggests the need for an alternative repair paradigm that relies on a representation more accessible to LLMs than raw code, enabling more accurate understanding, analysis, and alignment during repair. To address this gap, we propose VibeRepair, a specification-centric APR technique that treats repair as behavior-specification repair rather than ad-hoc code editing. VibeRepair first translates buggy code into a structured behavior specification that captures the program's intended runtime behavior, then infers and repairs specification misalignments, and finally synthesizes code strictly guided by the corrected behavior specification. An on-demand reasoning component enriches hard cases with program analysis and historical bug-fix evidence while controlling cost. Across Defects4J and real-world benchmarks and multiple LLMs, VibeRepair demonstrates consistently strong repair effectiveness with a significantly smaller patch space. On Defects4J v1.2, VibeRepair correctly repairs 174 bugs, exceeding the strongest state-of-the-art baseline by 28 bugs, which corresponds to a 19% improvement. On Defects4J v2.0, it repairs 178 bugs, outperforming prior approaches by 33 bugs, representing a 23% improvement. Evaluations on real-world benchmarks collected after the training period of selected LLMs further confirm its effectiveness and generalizability. By centering repair on explicit behavioral intent, VibeRepair reframes APR for the era of "vibe" coding: make the behavior sing, and the code will follow.

翻译：基于大语言模型（LLM）的自动程序修复（APR）技术发展迅速，但现有方法大多仍以代码为中心：它们直接重写源代码，因而存在产生幻觉式修复或行为不一致修复的风险。这一局限性表明，需要一种替代性的修复范式，该范式应依赖一种比原始代码更易于LLM理解的表示形式，从而在修复过程中实现更准确的理解、分析和对齐。为弥补这一不足，我们提出了VibeRepair，一种以规约为中心的APR技术，它将修复视为行为规约的修复，而非临时的代码编辑。VibeRepair首先将缺陷代码转换为结构化的行为规约，以捕捉程序的预期运行时行为；随后推断并修复规约中的不一致之处；最后，在修正后的行为规约的严格指导下合成代码。一个按需启用的推理组件通过程序分析和历史缺陷修复证据来增强对复杂案例的处理能力，同时控制成本。在Defects4J和现实世界基准测试以及多种LLM上的实验表明，VibeRepair始终展现出强大的修复有效性，且生成的补丁空间显著更小。在Defects4J v1.2上，VibeRepair正确修复了174个缺陷，比当前最强的基线方法多修复28个，相当于提升了19%。在Defects4J v2.0上，它修复了178个缺陷，优于先前方法33个，相当于提升了23%。在选定LLM训练周期后收集的现实世界基准测试上的评估进一步证实了其有效性和泛化能力。通过将修复的核心置于明确的行为意图上，VibeRepair为"氛围"编码时代的APR提供了新的框架：让行为先行，代码自会随之而来。