AI coding agents are now submitting pull requests (PRs) to software projects, acting not just as assistants but as autonomous contributors. As these agentic contributions are rapidly increasing across real repositories, little is known about how they behave in practice and why many of them fail to be merged. In this paper, we conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub. (RQ1) We first quantitatively characterize merged and not-merged PRs along four broad dimensions: 1) merge outcomes across task types, 2) code changes, 3) CI build results, and 4) review dynamics. We observe that tasks related to documentation, CI, and build update achieve the highest merge success, whereas performance and bug-fix tasks perform the worst. Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project's CI/CD pipeline validation. (RQ2) To further investigate why some agentic PRs are not merged, we qualitatively analyze 600 PRs to derive a hierarchical taxonomy of rejection patterns. This analysis complements the quantitative findings in RQ1 by uncovering rejection reasons not captured by quantitative metrics, including lack of meaningful reviewer engagement, duplicate PRs, unwanted feature implementations, and agent misalignment. Together, our findings highlight key socio-technical and human-AI collaboration factors that are critical to improving the success of future agentic workflows.
翻译:AI编程代理现正向软件项目提交拉取请求(PRs),其角色已超越辅助工具而成为自主贡献者。随着此类代理式贡献在实际代码库中快速增长,人们对其实际行为模式及多数PR未能成功合并的原因仍知之甚少。本文通过对GitHub上五种编程代理创建的3.3万个代理生成PR进行大规模实证研究。(研究问题一)我们首先从四个宏观维度对已合并与未合并PR进行量化表征:1)跨任务类型的合并结果,2)代码变更特征,3)持续集成构建结果,4)评审动态。研究发现:文档更新、CI配置与构建优化类任务的合并成功率最高,而性能优化与缺陷修复类任务表现最差。未合并PR往往涉及更大规模的代码变更,波及更多文件,且通常无法通过项目的CI/CD流水线验证。(研究问题二)为深入探究部分代理式PR未获合并的原因,我们对600个PR进行质性分析,构建了多层次拒收模式分类体系。该分析通过揭示量化指标未能捕捉的拒收原因(包括缺乏实质性评审互动、重复PR、非预期功能实现及代理行为失准),对研究问题一的量化发现形成有效补充。综合研究表明,社会技术因素与人机协作机制是提升未来代理式工作流成功率的关键要素。