Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but they are rarely included in bug reports, both in open-source and in industrial settings. Thus, automatically generating BRTs from bug reports has the potential to accelerate the debugging process and lower time to repair. This paper investigates automated BRT generation within an industry setting, specifically at Google, focusing on the challenges of a large-scale, proprietary codebase and considering real-world industry bugs extracted from Google's internal issue tracker. We adapt and evaluate a state-of-the-art BRT generation technique, LIBRO, and present our agent-based approach, BRT Agent, which makes use of a fine-tuned Large Language Model (LLM) for code editing. Our BRT Agent significantly outperforms LIBRO, achieving a 28% plausible BRT generation rate, compared to 10% by LIBRO, on 80 human-reported bugs from Google's internal issue tracker. We further investigate the practical value of generated BRTs by integrating them with an Automated Program Repair (APR) system at Google. Our results show that providing BRTs to the APR system results in 30% more bugs with plausible fixes. Additionally, we introduce Ensemble Pass Rate (EPR), a metric which leverages the generated BRTs to select the most promising fixes from all fixes generated by APR system. Our evaluation on EPR for Top-K and threshold-based fix selections demonstrates promising results and trade-offs. For example, EPR correctly selects a plausible fix from a pool of 20 candidates in 70% of cases, based on its top-1 ranking.
翻译:Bug报告往往缺乏足够细节供开发者复现并修复底层缺陷。Bug复现测试(BRT)——即当Bug存在时失败、Bug解决后通过的测试——对于调试至关重要,但在开源和工业环境中,BRT很少被包含在Bug报告中。因此,从Bug报告自动生成BRT具有加速调试过程、缩短修复时间的潜力。本文研究了工业环境(特别是Google内部)的自动化BRT生成技术,重点关注大规模专有代码库带来的挑战,并基于从Google内部问题跟踪系统提取的真实工业Bug展开分析。我们适配并评估了当前最先进的BRT生成技术LIBRO,同时提出了基于智能体的方法BRT Agent,该方法利用微调的大型语言模型(LLM)进行代码编辑。在Google内部问题跟踪系统的80个人工报告Bug上,我们的BRT Agent显著优于LIBRO,实现了28%的合理BRT生成率(LIBRO为10%)。我们进一步通过将生成的BRT与Google内部的自动化程序修复(APR)系统集成,探究了生成BRT的实际价值。实验结果表明,为APR系统提供BRT可使获得合理修复的Bug数量增加30%。此外,我们提出了集成通过率(EPR)这一指标,该指标利用生成的BRT从APR系统生成的所有修复方案中选择最具潜力的修复。我们对Top-K和基于阈值的修复选择策略进行的EPR评估显示出有前景的结果与权衡关系。例如,在70%的情况下,EPR能基于其Top-1排名从20个候选修复中正确选择出合理修复。