Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then queries the LLM to generate patches. While the LLM-based APR tools are able to achieve state-of-the-art results, it still follows the classic Generate and Validate repair paradigm of first generating lots of patches and then validating each one afterwards. This not only leads to many repeated patches that are incorrect but also miss the crucial information in test failures as well as in plausible patches. To address these limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same mistakes. For earlier patches that passed all the tests, we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM -- ChatGPT. By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugs for \$0.42 each!
翻译:自动程序修复(APR)旨在自动生成有缺陷程序的补丁。近期APR研究聚焦于利用现代大型语言模型(LLM)直接生成补丁。此类基于LLM的APR工具通过首先构建基于原始错误代码的输入提示,然后查询LLM生成补丁来工作。尽管基于LLM的APR工具能够取得最先进的结果,但它仍遵循经典的生成与验证修复范式:先生成大量补丁,再逐一验证。这不仅导致许多重复的错误补丁,还忽略了测试失败及合理补丁中的关键信息。为解决这些局限,我们提出ChatRepair,这是首个全自动对话驱动的APR方法,它将补丁生成与即时反馈交错进行,以对话风格执行APR。ChatRepair首先向LLM输入相关测试失败信息,然后从同一错误的早期修补尝试的成功与失败中学习,以实现更强大的APR。对于先前未能通过所有测试的补丁,我们将错误补丁与其对应的相关测试失败信息结合,构建新的提示,引导LLM生成下一个补丁。这样,我们可以避免重复犯错。对于先前通过所有测试的补丁,我们进一步要求LLM生成原始合理补丁的替代变体。通过这种方式,我们可以基于早期成功进一步构建和学习,生成更多合理补丁,从而增加生成正确补丁的机会。虽然我们的方法具有通用性,但我们是使用基于最先进对话型LLM——ChatGPT来实现ChatRepair。经计算访问ChatGPT的成本,我们能够以每个错误0.42美元的代价修复337个错误中的162个!