We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.
翻译:我们提出代码比较微调(Code Comparison Tuning, CCT),这是一种针对代码大语言模型(Code LLMs)的简单且有效的微调方法,旨在更好地处理细微代码错误。具体而言,我们将比较的概念融入指令微调,在词元级别和序列级别使模型能够识别代码中最细微的偏差。通过将原始代码与手动添加代码错误的有缺陷版本进行比较,我们采用词元级偏好损失进行精细的词元级比较。此外,我们组合代码片段生成新的指令微调样本用于序列级比较,从而增强模型的缺陷修复能力。在HumanEvalFix基准上的实验结果表明,在不同代码大语言模型上,CCT在pass@1分数方面较指令微调提升最多4个百分点,广泛的分析也证实了我们方法的有效性。