Large Language Models (LLMs) are increasingly being applied across various domains, including code-related tasks such as code translation. Previous studies have explored using LLMs for translating code between different programming languages. Since LLMs are more effective with natural language, using natural language as an intermediate representation in code translation tasks presents a promising approach. In this work, we investigate using NL-specification as an intermediate representation for code translation. We evaluate our method using three datasets, five popular programming languages, and 29 language pair permutations. Our results show that using NL-specification alone does not lead to performance improvements. However, when combined with source code, it provides a slight improvement over the baseline in certain language pairs. Besides analyzing the performance of code translation, we also investigate the quality of the translated code and provide insights into the issues present in the translated code.
翻译:大语言模型(LLMs)正日益广泛应用于包括代码翻译在内的各类代码相关任务。先前研究已探索利用LLMs在不同编程语言之间进行代码翻译。由于LLMs对自然语言的处理更为高效,在代码翻译任务中使用自然语言作为中间表示成为一种颇具前景的方法。本研究探讨以自然语言规范作为代码翻译的中间表示。我们使用三个数据集、五种主流编程语言及29种语言对组合进行评估。结果表明,单独使用自然语言规范并未带来性能提升,但当其与源代码结合时,能在特定语言对上较基线实现轻微改进。除分析代码翻译性能外,本研究还探究了翻译代码的质量,并对翻译代码中存在的问题提供了深入见解。