Code translation tools are developed for automatic source-to-source translation. Although learning-based transpilers have shown impressive enhancement against rule-based counterparts, owing to their task-specific pre-training on extensive monolingual corpora. Their current performance still remains unsatisfactory for practical deployment, and the associated training resources are also prohibitively expensive. LLMs pre-trained on huge amounts of human-written code/text have shown remarkable performance in many code intelligence tasks due to their powerful generality, even without task-specific training. Thus, LLMs can potentially circumvent the above limitations, but they have not been exhaustively explored yet. This paper investigates diverse LLMs and learning-based transpilers for automated code translation tasks, finding that: although certain LLMs have outperformed current transpilers, they still have some accuracy issues, where most of the failures are induced by a lack of comprehension of source programs (38.51%), missing clear instructions on I/O types in translation (14.94%), and ignoring discrepancies between source and target programs (41.38%). Enlightened by the above findings, we propose UniTrans, an Unified code Translation framework, applicable to various LLMs, for unleashing their power in this field. Specifically, UniTrans first craft a series of test cases for target programs with the assistance of source programs. Next, it harnesses the above auto-generated test cases to augment the code translation and then evaluate their correctness via execution. Afterward, UniTrans further (iteratively) repairs incorrectly translated programs prompted by test case execution results. Extensive experiments are conducted on six translation datasets between Python, Java, and C++. Three recent LLMs of diverse sizes are tested with UniTrans, and all achieve substantial improvements.
翻译:代码翻译工具旨在实现自动化的源到源翻译。尽管基于学习的转译器因其在大量单语语料库上的特定任务预训练而表现出比基于规则的方法更显著的性能提升,但它们当前的性能仍无法满足实际部署需求,且相关训练资源也极其昂贵。在大量人工编写的代码/文本上预训练的大语言模型(LLM)凭借其强大的通用性,即使在缺乏特定任务训练的情况下,也在诸多代码智能任务中展现出卓越性能。因此,LLM 有可能规避上述局限性,但尚未得到充分探索。本文针对自动化代码翻译任务,研究了多种 LLM 和基于学习的转译器,发现:尽管某些 LLM 已超越现有的转译器,但仍存在准确性难题,其中大多数错误源于对源程序的理解不足(38.51%)、翻译中缺少对输入/输出类型的明确指令(14.94%),以及忽视源程序与目标程序之间的差异(41.38%)。受上述发现启发,我们提出 UniTrans(统一代码翻译框架)——一个适用于各种 LLM 的框架,旨在释放其在该领域的潜力。具体而言,UniTrans 首先在源程序的辅助下为目标程序生成一系列测试用例;接着,它利用上述自动生成的测试用例来增强代码翻译,并通过执行来评估其正确性;随后,UniTrans 进一步(迭代地)根据测试用例执行结果修复翻译错误的程序。我们在 Python、Java 和 C++ 的六组翻译数据集上开展了广泛实验,三种不同规模的最新 LLM 在使用 UniTrans 后均取得了显著提升。