Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back using neural machine translation with language models. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs restores the most commonly seen patterns in code during pre-training, i.e., performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. Keywords: automated program repair, large language model, machine translation
翻译:研究表明,通过使用神经机器翻译和语言模型将句子翻译成另一种语言再译回,可以纠正句子中的语法错误。我们研究大语言模型(LLMs)的这种纠错能力是否适用于自动程序修复(APR)。当前的APR生成模型通常基于源代码进行预训练,并通过微调实现修复功能。本文提出跳过微调步骤,直接采用往返翻译(RTT)方法:即将代码从一种编程语言翻译成另一种编程语言或自然语言,再译回原语言。我们假设,大语言模型通过RTT能够恢复预训练过程中最常见的代码模式,即执行向均值的回归,从而消除作为噪声形式的缺陷(相对于训练数据中更常见、自然、无缺陷的代码而言)。为验证这一假设,我们使用了八个近期基于代码预训练的大语言模型(包括最新的GPT版本)以及四个常见的Java程序修复基准测试集。实验结果表明,在HumanEval-Java数据集上,使用GPT-4和英文作为中间语言的RTT方法修复了164个缺陷中的101个,其中46个是其他经过APR微调的大语言模型未能修复的独有缺陷。我们的发现凸显了基于大语言模型的往返翻译在自动化程序修复技术中的可行性,及其在软件工程研究中的潜力。关键词:自动化程序修复,大语言模型,机器翻译