Bug Reproduction Tests (BRTs) have been used in many agentic Automated Program Repair (APR) systems, primarily for validating promising fixes and aiding fix generation. In practice, when developers submit a patch, they often implement the BRT alongside the fix. Our experience deploying agentic APR reveals that developers similarly desire a BRT within AI-generated patches to increase their confidence. However, canonical APR systems tend to generate BRTs and fixes separately, or focus on producing only the fix in the final patch. In this paper, we study agentic APR in the context of cogeneration, where the APR agent is instructed to generate both a fix and a BRT in the same patch. We evaluate the effectiveness of different cogeneration strategies on 120 human-reported bugs at Google and characterize different cogeneration strategies by their influence on APR agent behavior. We develop and evaluate patch selectors that account for test change information to select patches with plausible fixes (and plausible BRTs). Finally, we analyze the root causes of failed cogeneration trajectories. Importantly, we show that cogeneration allows the APR agent to generate BRTs for at least as many bugs as a dedicated BRT agent, without compromising the generation rate of plausible fixes, thereby reducing engineering effort in maintaining and coordinating separate generation pipelines for fix and BRT at scale.
翻译:缺陷复现测试(BRT)已被广泛应用于多种代理式自动程序修复(APR)系统中,主要用于验证潜在修复方案并辅助修复生成。实践中,开发者在提交补丁时通常会同步实现相应的BRT。我们在部署代理式APR系统的经验中发现,开发者同样期望在AI生成的补丁中包含BRT以增强其信心。然而,典型的APR系统往往将BRT与修复方案分离生成,或在最终补丁中仅专注于生成修复方案。本文研究协同生成背景下的代理式APR,即指导APR代理在同一补丁中同时生成修复方案与BRT。我们基于谷歌120个人工上报的缺陷评估不同协同生成策略的有效性,并通过分析其对APR代理行为的影响来刻画各类策略特征。我们开发并评估了能够利用测试变更信息的补丁选择器,以筛选出包含合理修复方案(及合理BRT)的补丁。最后,我们深入分析了协同生成轨迹失败的根源。关键发现表明:协同生成机制能使APR代理为至少与专用BRT代理等量的缺陷生成BRT,同时保持合理修复方案的生成率不受影响,从而显著降低了大规模维护与协调修复方案和BRT独立生成管线的工程成本。