[...] Since then, various APR approaches, especially those leveraging the power of large language models (LLMs), have been rapidly developed to fix general software bugs. Unfortunately, the effectiveness of these advanced techniques in the context of regression bugs remains largely unexplored. This gap motivates the need for an empirical study evaluating the effectiveness of modern APR techniques in fixing real-world regression bugs. In this work, we conduct an empirical study of APR techniques on regression bugs. To facilitate our study, we introduce RegressionBug4APR, a high-quality benchmark of Java and Python regression bugs integrated into a framework designed to facilitate APR research. The current benchmark includes 200 regression bugs collected from widely used real-world GitHub repositories. We begin by conducting an in-depth analysis of the benchmark, demonstrating its diversity and quality. Building on this foundation, we empirically evaluate the capabilities of APR to regression bugs by assessing both traditional APR tools and advanced LLM-based APR approaches. Our experimental results show that classical APR tools fail to repair any bugs, while LLM-based APR approaches exhibit promising potential. Motivated by these results, we investigate impact of incorporating bug-inducing change information into LLM-based APR approaches for fixing regression bugs. We further conduct an ablation study to disaggregate the contribution of each contextual element within the bug-inducing change information. Our results highlight that this context-aware enhancement significantly improves the performance of LLM-based APR, yielding 1.6x more successful repairs compared to using LLM-based APR without such context. Moreover, our findings are consistent across both Java and Python benchmarks, providing preliminary evidence for the generalizability of our findings.
翻译:暂无翻译