Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement

With the development of large language models (LLMs) in the field of programming, intelligent programming coaching systems have gained widespread attention. However, most research focuses on repairing the buggy code of programming learners without providing the underlying causes of the bugs. To address this gap, we introduce a novel task, namely \textbf{LPR} (\textbf{L}earner-Tailored \textbf{P}rogram \textbf{R}epair). We then propose a novel and effective framework, \textbf{\textsc{\MethodName{}}} (\textbf{L}earner-Tailored \textbf{S}olution \textbf{G}enerator), to enhance program repair while offering the bug descriptions for the buggy code. In the first stage, we utilize a repair solution retrieval framework to construct a solution retrieval database and then employ an edit-driven code retrieval approach to retrieve valuable solutions, guiding LLMs in identifying and fixing the bugs in buggy code. In the second stage, we propose a solution-guided program repair method, which fixes the code and provides explanations under the guidance of retrieval solutions. Moreover, we propose an Iterative Retrieval Enhancement method that utilizes evaluation results of the generated code to iteratively optimize the retrieval direction and explore more suitable repair strategies, improving performance in practical programming coaching scenarios. The experimental results show that our approach outperforms a set of baselines by a large margin, validating the effectiveness of our framework for the newly proposed LPR task.

翻译：随着大语言模型（LLM）在编程领域的发展，智能编程辅导系统获得了广泛关注。然而，大多数研究集中于修复编程学习者的错误代码，而未提供错误的根本原因。为弥补这一不足，我们引入了一项新颖的任务，即 **LPR**（**L**earner-Tailored **P**rogram **R**epair，学习者定制化程序修复）。随后，我们提出了一种新颖且有效的框架 **\textsc{\MethodName{}}**（**L**earner-Tailored **S**olution **G**enerator，学习者定制化解决方案生成器），旨在增强程序修复的同时，为错误代码提供错误描述。在第一阶段，我们利用修复方案检索框架构建解决方案检索数据库，然后采用编辑驱动的代码检索方法来检索有价值的解决方案，以指导LLM识别并修复错误代码中的缺陷。在第二阶段，我们提出了一种解决方案引导的程序修复方法，该方法在检索到的解决方案指导下修复代码并提供解释。此外，我们提出了一种迭代检索增强方法，该方法利用生成代码的评估结果来迭代优化检索方向并探索更合适的修复策略，从而提升实际编程辅导场景中的性能。实验结果表明，我们的方法大幅超越了一系列基线模型，验证了我们针对新提出的LPR任务所设计框架的有效性。