Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality static specifications, test cases, and natural language descriptions from a given program, and then uses these along with the source code to improve the quality of LLM-generated translations. We evaluate SpecTra on three code translation tasks - C to Rust, C to Go, and JavaScript to TypeScript - and show that it can enhance the performance of six popular LLMs on these tasks by up to 10 percentage points and a relative improvement of 26\%. Our research suggests that generating high-quality specifications could be a promising and efficient way to improve the performance of LLMs for code translation. We make our code and data available, anonymized for review.
翻译:大型语言模型(LLM)在自动化代码翻译任务中的应用日益广泛,该任务具有重要的实际应用价值。然而,现有方法大多仅将程序的源代码作为LLM的输入,未考虑可从程序中提取的不同类型的规约。本文提出SpecTra,一种多阶段方法,其采用一种新颖的自一致性过滤器,首先从给定程序生成高质量的静态规约、测试用例和自然语言描述,随后将这些信息与源代码结合使用,以提升LLM生成翻译的质量。我们在三个代码翻译任务——C到Rust、C到Go以及JavaScript到TypeScript——上评估SpecTra,结果表明,该方法可将六种流行LLM在这些任务上的性能提升高达10个百分点,相对改进达26%。我们的研究表明,生成高质量的规约可能是提升LLM代码翻译性能的一种有前景且高效的途径。我们的代码与数据已开源,为评审做了匿名化处理。