AI coding agents can autonomously generate pull requests (PRs), yet little is known about how their contributions compare to those of humans. We analyze 33,596 agent-generated PRs (APRs) and 6,618 human PRs (HPRs) to compare code-change characteristics and message quality. We observe that APR-introduced symbols (functions and classes) are removed much sooner than those in HPRs (median time to removal 3 vs. 34 days) and are also removed more often (symbol churn 7.33% vs. 4.10%), reflecting a focus on other tasks like documentation and test updates. Agents generate stronger commit-level messages (semantic similarity 0.72 vs. 0.68) but lag humans at PR-level summarization (PR-commit similarity 0.86 vs. 0.88). Commit message length is the best predictor of description quality, indicating reliance on individual commits over full-PR reasoning. These findings highlight a gap between agents' micro-level precision and macro-level communication, suggesting opportunities to improve agent-driven development workflows.
翻译:AI编码智能体能够自主生成拉取请求(PRs),但其贡献与人类相比如何尚不明确。本研究通过分析33,596个智能体生成的PRs(APRs)与6,618个人工PRs(HPRs),对比了代码变更特征与信息质量。研究发现:APRs引入的符号(函数与类)比HPRs中的符号更快被移除(中位移除时间分别为3天与34天),且移除频率更高(符号变更率分别为7.33%与4.10%),反映出智能体更侧重于文档更新和测试修改等其他任务。智能体在提交层级能生成更优质的描述信息(语义相似度0.72对比0.68),但在PR层级的总结能力落后于人类(PR-提交相似度0.86对比0.88)。提交信息长度是描述质量的最佳预测指标,表明智能体更依赖单次提交而非完整PR的推理逻辑。这些发现揭示了智能体在微观层面编码精确性与宏观层面沟通能力之间的差距,为改进智能体驱动的开发工作流提供了优化方向。