As autonomous AI agents transition from code completion tools to full-fledged teammates capable of opening pull requests (PRs) at scale, software maintainers face a new challenge: not just reviewing code, but managing complex interaction loops with non-human contributors. This paradigm shift raises a critical question: can we predict which agent-generated PRs will consume excessive review effort before any human interaction begins? Analyzing 33,707 agent-authored PRs from the AIDev dataset across 2,807 repositories, we uncover a striking two-regime behavioral pattern that fundamentally distinguishes autonomous agents from human developers. The first regime, representing 28.3 percent of all PRs, consists of instant merges (less than 1 minute), reflecting success on narrow automation tasks. The second regime involves iterative review cycles where agents frequently stall or abandon refinement (ghosting). We propose a Circuit Breaker triage model that predicts high-review-effort PRs (top 20 percent) at creation time using only static structural features. A LightGBM model achieves AUC 0.957 on a temporal split, while semantic text features (TF-IDF, CodeBERT) provide negligible predictive value. At a 20 percent review budget, the model intercepts 69 percent of total review effort, enabling zero-latency governance. Our findings challenge prevailing assumptions in AI-assisted code review: review burden is dictated by what agents touch, not what they say, highlighting the need for structural governance mechanisms in human-AI collaboration.
翻译:随着自主AI代理从代码补全工具转变为能够大规模开启拉取请求(PRs)的成熟团队成员,软件维护者面临新的挑战:不仅要评审代码,还需管理与非人类贡献者的复杂交互循环。这一范式转变引发了一个关键问题:能否在任何人机交互开始前,预测哪些由AI代理生成的PR将消耗过量的评审工作量?通过分析来自AIDev数据集中2,807个代码库的33,707个由AI代理撰写的PR,我们发现了一个显著的双模态行为模式,从根本上将自主代理与人类开发者区分开来。第一种模态(占所有PR的28.3%)表现为即时合并(少于1分钟),反映了在狭窄自动化任务上的成功。第二种模态涉及迭代式评审循环,其中代理经常停滞或放弃代码优化(幽灵式弃置)。我们提出了一种断路器分诊模型,该模型仅使用静态结构特征即可在创建时预测高评审工作量PR(前20%)。LightGBM模型在时间分割测试中达到AUC 0.957,而语义文本特征(TF-IDF、CodeBERT)的预测价值可忽略不计。在20%的评审预算约束下,该模型可拦截69%的总评审工作量,实现零延迟治理。我们的研究结果挑战了当前AI辅助代码评审中的普遍假设:评审负担由代理修改的内容决定,而非其描述内容,这凸显了在人机协作中建立结构化治理机制的必要性。