Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real maintainers reject. The root cause is not functional incorrectness but a lack of organicity: generated code ignores project-specific conventions, duplicates functionality already provided by internal APIs, and violates implicit architectural constraints accumulated over years of development. Simply exposing an agent to the latest repository snapshot is not enough: the snapshot reveals the final state of the codebase, but not the repository-specific change patterns by which that state was reached. We introduce Learning to Commit, a framework that closes this gap through Online Repository Memory. Given a repository with a strict chronological split, the agent performs supervised contrastive reflection on earlier commits: it blindly attempts to resolve each historical issue, compares its prediction against the oracle diff, and distils the gap into a continuously growing set of skills-reusable patterns capturing coding style, internal API usage, and architectural invariants. When a new PR description arrives, the agent conditions its generation on these accumulated skills, producing changes grounded in the project's own evolution rather than generic pretraining priors. Evaluation is conducted on genuinely future, merged pull requests that could not have been seen during the skill-building phase, and spans multiple dimensions including functional correctness, code-style consistency, internal API reuse rate, and modified-region plausibility. Experiments on an expert-maintained repository with rich commit history show that Online Repository Memory effectively improves organicity scores on held-out future tasks.

翻译：基于大型语言模型（LLM）的编程智能体在受控基准测试中取得了令人印象深刻的结果，但产出的拉取请求却常被实际维护者拒绝。根本原因并非功能正确性问题，而是缺乏自然性：生成的代码忽略了项目特定规范，重复了内部API已提供的功能，并违反了多年开发积累的隐式架构约束。仅仅将智能体暴露于最新仓库快照是不够的：快照只展示了代码库的最终状态，而非达到该状态的仓库特定变更模式。我们提出"学会提交"框架，通过在线仓库记忆弥合这一差距。对于具有严格时间顺序划分的仓库，智能体对早期提交进行监督式对比反思：它盲目尝试解决每个历史问题，将其预测与真实差异进行比较，并将差距提炼为持续增长的技能集——这些可复用模式包含了编码风格、内部API使用和架构不变性。当新的拉取请求描述出现时，智能体基于这些积累的技能生成内容，产生扎根于项目自身演进而非通用预训练先验的变更。评估在未来真实合并的拉取请求上进行（这些请求在技能构建阶段不可见），涵盖功能正确性、代码风格一致性、内部API复用率和修改区域合理性等多个维度。对具有丰富提交历史且由专家维护的仓库进行的实验表明，在线仓库记忆有效提升了未来待处理任务的自然性得分。