Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.
翻译:大型语言模型(LLM)在代码生成任务上已展现出卓越性能。LLM在代码生成领域的一个新兴应用是迭代式代码修复,即模型通过分析错误原因并生成新程序来修正错误程序。然而,现有代码修复研究主要集中于Python等高资源语言,该框架在低资源语言上的有效性尚未得到充分探索。为将代码修复应用于低资源语言,本文提出蒸馏式低资源修复方法(DistiLRR),该方法将教师模型的推理与代码生成能力迁移至学生模型。实验结果表明,DistiLRR在低资源语言上持续优于基线方法,但在高资源语言上表现相近。为探究此现象,我们进一步分析发现:代码修正质量与推理依据质量之间的关联性弱于既往认知。我们推测这种弱关联性在低资源场景中被放大——基础模型因缺乏对编程语言的深层理解,导致代码修复在高资源与低资源语言间的效益产生波动。