AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering
翻译:AI编程智能体正通过执行功能开发、调试和测试等任务,迅速改变软件工程领域。尽管其影响日益增长,但研究界仍缺乏全面数据集来捕捉这些智能体在实际项目中的使用情况。为填补这一空白,我们提出了AIDev——一个专注于真实GitHub仓库中智能体提交的拉取请求(Agentic-PR)的大规模数据集。AIDev汇总了由五个智能体生成的932,791个Agentic-PR,这些智能体包括:OpenAI Codex、Devin、GitHub Copilot、Cursor和Claude Code。这些PR覆盖116,211个代码仓库,涉及72,189名开发者。此外,AIDev还包含从2,807个星标数超过100的仓库中精选的33,596个Agentic-PR子集,提供评论、审阅、提交记录及相关问题等详细信息。该数据集为研究新时代软件工程中的人工智能应用、开发者生产力以及人机协作奠定了重要基础。> AI智能体,具身智能AI,编程智能体,具身编程,具身软件工程,具身工程