Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study

Autonomous coding agents (e.g., OpenAI Codex, Devin, GitHub Copilot) are increasingly used to generate fix-related pull requests (PRs) in real world software repositories. However, their practical effectiveness depends on whether these contributions are accepted and merged by project maintainers. In this paper, we present an empirical study of AI agent involved fix related PRs, examining both their integration outcomes, latency, and the factors that hinder successful merging. We first analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset to quantify the proportions of PRs that are merged, closed without merging, or remain open. We then conduct a manual qualitative analysis of a statistically significant sample of 326 closed but unmerged PRs, spending approximately 100 person hours to construct a structured catalog of 12 failure reasons. Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration, whereas build or deployment failures are comparatively rare. Overall, our findings expose key limitations of current AI coding agents in real world settings and highlight directions for their further improvement and for more effective human AI collaboration in software maintenance.

翻译：自主编码代理（例如 OpenAI Codex、Devin、GitHub Copilot）在现实世界软件仓库中越来越多地被用于生成修复相关的拉取请求。然而，其实践有效性取决于这些贡献是否被项目维护者接受并合并。本文对 AI 代理参与的修复相关 PR 进行了一项实证研究，考察了其集成结果、延迟以及阻碍成功合并的因素。我们首先分析了来自 AIDEV POP 数据集的、由五个广泛使用的 AI 编码代理撰写的 8,106 个修复相关 PR，以量化被合并、未合并即关闭或仍处于开放状态的 PR 比例。随后，我们对一个具有统计显著性的、包含 326 个已关闭但未合并 PR 的样本进行了人工定性分析，花费约 100 人时构建了一个包含 12 类失败原因的结构化目录。我们的结果表明，测试用例失败以及同一问题已被其他 PR 解决是最常见的未集成原因，而构建或部署失败则相对罕见。总体而言，我们的研究结果揭示了当前 AI 编码代理在现实环境中的关键局限性，并为其进一步改进以及软件维护中更有效的人机协作指明了方向。

相关内容

关注 7107

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AI生成代码缺陷综述

专知会员服务

17+阅读 · 2025年12月8日

【EMNLP2025】ReCode：基于细粒度检索增强生成的LLM代码修复方法

专知会员服务

10+阅读 · 2025年9月3日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

32+阅读 · 2025年3月27日

中国AI Agent行业研究报告（二）

专知会员服务

48+阅读 · 2025年3月13日