Early-Stage Prediction of Review Effort in AI-Generated Pull Requests

As AI coding agents evolve from autocomplete tools to autonomous "AI workforce" teammates, they introduce a critical new bottleneck: human maintainers must now manage complex interaction loops rather than just reviewing code. Analyzing 33,707 agent-authored PRs, we uncover a stark two-regime reality: agents excel at narrow automation (28.3% of PRs merge instantly), but frequently fail at iterative refinement, leading to "ghosting" (abandonment) when faced with subjective feedback. This creates a hidden "attention tax" on maintainers. We introduce a creation-time Circuit Breaker model to predict high-maintenance PRs before human review begins. By leveraging simple static complexity cues (e.g., file types, patch size), our model identifies the "expensive tail" of contributions with AUC 0.96, enabling a gated triage process. At a 20% review budget, this approach captures 69% of the high-effort PRs, effectively allowing maintainers to fast-fail costly, low-quality agent contributions while fast-tracking simple fixes.

翻译：随着AI编程代理从自动补全工具演变为自主的“AI劳动力”团队成员，它们引入了一个关键的新瓶颈：人类维护者现在必须管理复杂的交互循环，而不仅仅是审查代码。通过分析33,707个由代理撰写的PR，我们揭示了一个鲜明的双模态现实：代理在狭窄的自动化任务上表现出色（28.3%的PR被即时合并），但在迭代优化方面经常失败，导致在面对主观反馈时出现“幽灵化”（被放弃）。这给维护者带来了隐性的“注意力税”。我们引入了一种创建时断路器模型，用于在人工审查开始前预测高维护成本的PR。通过利用简单的静态复杂性线索（例如文件类型、补丁大小），我们的模型以AUC 0.96的准确率识别出贡献的“昂贵尾部”，从而实现门控分流流程。在20%的审查预算下，该方法能捕获69%的高工作量PR，有效允许维护者快速终止成本高昂、低质量的代理贡献，同时快速处理简单的修复。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

25+阅读 · 3月8日

AI生成代码缺陷综述

专知会员服务

17+阅读 · 2025年12月8日

美智库《获取生成式人工智能以提升美国防部影响力活动效能》最新报告

专知会员服务

24+阅读 · 2025年7月23日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日