The rise of large language models for code has reshaped software development. Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects. Their growing role offers a unique and timely opportunity to investigate AI-driven contributions and their effects on code quality, team dynamics, and software maintainability. In this work, we construct a novel dataset of approximately $110,000$ open-source pull requests, including associated commits, comments, reviews, issues, and file changes, collectively representing millions of lines of source code. We compare five popular coding agents, including OpenAI Codex, Claude Code, GitHub Copilot, Google Jules, and Devin, examining how their usage differs in various development aspects such as merge frequency, edited file types, and developer interaction signals, including comments and reviews. Furthermore, we emphasize that code authoring and review are only a small part of the larger software engineering process, as the resulting code must also be maintained and updated over time. Hence, we offer several longitudinal estimates of survival and churn rates for agent-generated versus human-authored code. Ultimately, our findings indicate an increasing agent activity in open-source projects, although their contributions are associated with more churn over time compared to human-authored code.
翻译:代码方面大语言模型的兴起重塑了软件开发。能够创建分支、发起拉取请求并进行代码审查的自主编码智能体,如今正积极地为真实项目做出贡献。它们日益增长的角色提供了一个独特且及时的契机,用以研究人工智能驱动的贡献及其对代码质量、团队动态和软件可维护性的影响。在本研究中,我们构建了一个包含约11万个开源拉取请求的新数据集,这些请求包括相关的提交、评论、审查、议题和文件变更,共同代表了数百万行源代码。我们比较了五种流行的编码智能体,包括OpenAI Codex、Claude Code、GitHub Copilot、Google Jules和Devin,考察了它们在不同开发方面的应用差异,例如合并频率、编辑的文件类型以及包含评论和审查在内的开发者交互信号。此外,我们强调代码编写与审查只是更广泛软件工程过程的一小部分,因为生成的代码也必须随着时间的推移得到维护和更新。因此,我们提供了关于智能体生成代码与人类编写代码存活率和流失率的几项纵向估计。最终,我们的研究结果表明,开源项目中的智能体活动日益增加,尽管与人类编写的代码相比,它们的贡献在随时间推移中与更高的流失率相关。