Software issue resolution in large repositories is a long-range decision process: choices made during localization shape the space of viable edits, and missteps can compound into incorrect patches. Despite this, many LLM-based repair pipelines still operate in a reset-and-solve manner, producing fresh reasoning for every new issue instead of carrying forward what worked in past fixes. This is wasteful because repositories routinely contain earlier issues with overlapping structure, failure modes, or constraints, where prior repair experience could provide useful guidance. Existing approaches typically harvest this signal through forward-time trial procedures, such as repeated refinement or search, incurring high inference cost while still risking divergence from the eventual correct patch. We present an Outcome-Conditioned Reasoning Distillation(O-CRD) framework that uses resolved in-repository issues with verified patches as supervision. Starting from a historical fix, the method reconstructs a stage-wise repair trace backward from the verified outcome, then reuses the distilled guidance at inference time to steer file/function localization and patch synthesis, without fine-tuning or online search. On SWE-Bench Lite, this approach increases Pass@1 by 10.4% with GPT-4o, 8.6% with DeepSeek-V3, and 10.3% with GPT-5, indicating that outcome-conditioned reuse of verified repairs can replace costly forward exploration for software issue resolution.
翻译:大型代码库中的软件问题解决是一个长程决策过程:定位阶段所做的选择决定了可行编辑的空间,而错误步骤可能累积导致错误补丁。尽管如此,许多基于大语言模型的修复流程仍以重置-解决模式运行,为每个新问题生成全新的推理过程,而非延续过往成功修复的经验。这种方法是低效的,因为代码库通常包含大量具有重叠结构、故障模式或约束的历史问题,而先前的修复经验本可提供有效指导。现有方法通常通过前向试错过程(如重复优化或搜索)来获取这种信号,不仅推理成本高昂,还存在偏离最终正确补丁的风险。我们提出了基于结果约束的推理蒸馏框架,该框架以代码库中已解决且经过验证的补丁作为监督信号。该方法从历史修复记录出发,从已验证的结果向后重构阶段式修复轨迹,随后在推理时复用蒸馏得到的指导信息来引导文件/函数定位与补丁合成,整个过程无需微调或在线搜索。在SWE-Bench Lite基准测试中,该方法将GPT-4o的Pass@1指标提升10.4%,DeepSeek-V3提升8.6%,GPT-5提升10.3%,表明基于已验证修复结果的条件复用能够替代成本高昂的前向探索,实现高效的软件问题解决。