Agentic Automated Program Repair (APR) is increasingly tackling complex, repository-level bugs in industry, but ultimately these patches still need to be reviewed by a human before committing them to ensure they address the bug. Showing patches unlikely to be accepted can lead to substantial noise, wasting valuable developer time and eroding trust in automated code changes. We introduce two complementary LLM-based policies to reduce such noise: bug abstention and patch validation policies. Bug abstention excludes bugs that the agentic APR system is unlikely to fix. Patch validation rejects patches that are unlikely to be a good fix for the given bug. We evaluate both policies on three sets of bugs from Google's codebase, and their candidate patches generated by an internal agentic APR system. On a set of 174 human-reported bugs, removing bugs and patches rejected by our policies can raise success rates by up to 13 percentage points and 15 percentage points, respectively, and by up to 39 percentage points in combination. On null pointer exceptions and sanitizer-reported bugs with machine-generated bug reports, patch validation also improves average single-sample success rates. This two-policy approach provides a practical path to the reliable, industrial-scale deployment of agentic APR systems.
翻译:代理式自动程序修复(APR)正日益用于处理工业环境中复杂的仓库级缺陷,但这些补丁最终仍需由人工审核后方可提交,以确保其真正修复了缺陷。展示那些不太可能被接受的补丁会产生大量噪声,既浪费开发者的宝贵时间,也会削弱对自动化代码变更的信任。我们提出了两种互补的基于LLM的策略来降低此类噪声:缺陷弃权策略与补丁验证策略。缺陷弃权策略会排除代理式APR系统不太可能修复的缺陷;补丁验证策略则会拒绝那些不太可能成为有效修复方案的补丁。我们在来自Google代码库的三组缺陷及其内部代理式APR系统生成的候选补丁上对这两种策略进行了评估。在一组包含174个人工报告缺陷的数据集上,移除被我们策略拒绝的缺陷和补丁,可分别将成功率提升最多13个百分点和15个百分点,组合使用时最高可提升39个百分点。对于空指针异常及基于机器生成缺陷报告的净化器报告缺陷,补丁验证策略也能提高平均单样本成功率。这种双策略方法为代理式APR系统实现可靠、工业规模的部署提供了一条实用路径。