Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4's constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR.
翻译:自动程序修复(APR)致力于自主修复特定项目中的问题,通常涵盖三类任务:缺陷修复、新功能开发和功能增强。尽管已有大量研究提出了多种方法,但其解决实际问题的效果仍不尽如人意。值得注意的是,工程师在开始修改代码前,通常会对计划采用的解决方案及其背后的根本原因形成设计原理(DR)。在开源项目中,这些设计原理常通过Jira等项目管理系统记录在问题日志中。这引出了一个值得深究的问题:如何利用散布在问题日志中的设计原理来有效增强APR?为探究此前提,我们提出了DRCodePilot方法,通过将设计原理融入提示指令来增强GPT-4-Turbo的APR能力。此外,鉴于GPT-4在全面理解项目整体语境方面存在局限,且在生成精确标识符时偶有不足,我们设计了一种基于反馈的自反思框架——通过提示GPT-4参照提供的补丁和建议标识符,对其输出进行重新审视与优化。我们构建了一个包含938个问题-补丁对的基准数据集,这些数据源自GitHub和Jira托管的两个开源仓库。实验结果令人瞩目:DRCodePilot的完全匹配率达到了直接使用GPT-4时的4.7倍。同时,CodeBLEU分数也呈现出显著提升。此外,研究发现,在我们的基准测试集中,单独应用设计原理即可使CodeLlama、GPT-3.5和GPT-4的完全匹配率获得显著提高。我们相信,DRCodePilot研究为推进APR领域开辟了一条新颖的人机协同路径。