Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality invariants, test cases, and natural language descriptions from a given program, and then uses these along with the source code to improve the quality of LLM-generated translations. We evaluate SpecTra on two code translation tasks - C to Rust, and C to Go - and show that it can enhance the performance of four popular LLMs on these tasks by up to 10 percentage points and a relative improvement of up to 23%. Our research suggests that generating high-quality specifications could be a promising and efficient way to improve the performance of LLMs for code translation.
翻译:大型语言模型(LLM)正越来越多地用于自动化代码翻译任务,该任务具有重要的现实应用价值。然而,现有方法大多仅将程序的源代码作为LLM的输入,并未考虑可从程序中提取的不同类型的规约。本文提出SpecTra,一种多阶段方法,该方法首先使用一种新颖的自洽过滤器从给定程序生成高质量的**不变量**、测试用例和自然语言描述,然后将其与源代码结合使用,以提高LLM生成翻译的质量。我们在两个代码翻译任务(C到Rust以及C到Go)上评估了SpecTra,结果表明,它可以将四种流行LLM在这些任务上的性能提升高达10个百分点,相对改进高达23%。我们的研究表明,生成高质量的规约可能是提升LLM代码翻译性能的一种有前景且高效的途径。