Automated Program Repair (APR) attempts to fix software bugs without human intervention, which plays a crucial role in software development and maintenance. Recently, with the advances in Large Language Models (LLMs), a rapidly increasing number of APR techniques have been proposed with remarkable performance. However, existing LLM-based APR techniques typically adopt trial-and-error strategies, which suffer from two major drawbacks: (1) inherently limited patch effectiveness due to local exploration, and (2) low search efficiency due to redundant exploration. In this paper, we propose APRMCTS, which uses iterative tree search to improve LLM-based APR. APRMCTS incorporates Monte Carlo Tree Search (MCTS) into patch searching by performing a global evaluation of the explored patches and selecting the most promising one for subsequent refinement and generation. APRMCTS effectively resolves the problems of falling into local optima and thus helps improve the efficiency of patch searching. Our experiments on 835 bugs from Defects4J demonstrate that, when integrated with GPT-3.5, APRMCTS can fix a total of 201 bugs, which outperforms all state-of-the-art baselines. Besides, APRMCTS helps GPT-4o-mini, GPT-3.5, Yi-Coder-9B, and Qwen2.5-Coder-7B to fix 30, 27, 37, and 28 more bugs, respectively. More importantly, APRMCTS boasts a significant performance advantage while employing small patch size (16 and 32), notably fewer than the 500 and 10,000 patches adopted in previous studies. In terms of cost, compared to existing state-of-the-art LLM-based APR methods, APRMCTS has time and monetary costs of less than 20% and 50%, respectively. Our extensive study demonstrates that APRMCTS exhibits good effectiveness and efficiency, with particular advantages in addressing complex bugs.
翻译:自动程序修复(APR)旨在无需人工干预的情况下修复软件缺陷,在软件开发和维护中发挥着关键作用。近年来,随着大语言模型(LLMs)的进步,基于LLM的APR技术迅速涌现并展现出卓越性能。然而,现有基于LLM的APR技术通常采用试错策略,存在两大主要缺陷:(1)局部探索导致补丁有效性存在固有局限;(2)冗余探索导致搜索效率低下。本文提出APRMCTS,通过迭代树搜索改进基于LLM的APR。该方法将蒙特卡洛树搜索(MCTS)融入补丁搜索过程,对已探索补丁进行全局评估,并选择最具潜力的补丁进行后续优化与生成。APRMCTS有效解决了陷入局部最优的问题,从而提升了补丁搜索效率。我们在Defects4J数据集的835个缺陷上进行的实验表明,当与GPT-3.5结合时,APRMCTS共能修复201个缺陷,性能优于所有现有先进基线方法。此外,APRMCTS分别帮助GPT-4o-mini、GPT-3.5、Yi-Coder-9B和Qwen2.5-Coder-7B多修复了30、27、37和28个缺陷。更重要的是,APRMCTS在仅使用较小补丁规模(16和32个)的情况下仍展现出显著性能优势,该数量远少于以往研究中采用的500和10,000个补丁。在成本方面,与现有先进的基于LLM的APR方法相比,APRMCTS的时间成本和资金成本分别降低至20%和50%以下。我们的深入研究表明,APRMCTS具有良好的有效性和效率,在解决复杂缺陷方面具有突出优势。