Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these models tend to adhere to the superficial formats of feedback and provide feedback with misleading information. Hence, the focus of our work is to leverage open-source code LLMs to generate helpful feedback with correct guidance for code editing. To this end, we present Coffee, a collected dataset specifically designed for code fixing with feedback. Using this dataset, we construct CoffeePots, a framework for COde Fixing with FEEdback via Preference-Optimized Tuning and Selection. The proposed framework aims to automatically generate helpful feedback for code editing while minimizing the potential risk of superficial feedback. The combination of Coffee and CoffeePots marks a significant advancement, achieving state-of-the-art performance on HumanEvalFix benchmark. Codes and model checkpoints are publicly available at https://github.com/Lune-Blue/COFFEE.
翻译:代码编辑是实现可靠程序合成的关键步骤,旨在自动修正代码大语言模型生成的关键错误。近期研究表明,闭源大语言模型(如ChatGPT和GPT-4)能够生成修正性反馈以编辑错误输入。然而,开源代码大语言模型在生成用于代码编辑的反馈方面仍面临挑战,因为这类模型容易受反馈表面格式的影响,并产生包含误导信息的反馈。因此,本工作的核心在于利用开源代码大语言模型生成具有正确指导性的有效反馈,以支持代码编辑。为此,我们提出了Coffee——一个专为带反馈的代码修复任务构建的数据集。基于该数据集,我们构建了CoffeePots框架,通过偏好优化调优与选择实现带反馈的代码修复。该框架旨在自动生成对代码编辑有助益的反馈,同时最小化表面反馈可能带来的风险。Coffee与CoffeePots的结合标志着显著进展,在HumanEvalFix基准测试中取得了最先进的性能。代码与模型检查点已开源至 https://github.com/Lune-Blue/COFFEE。