Code translation tools (transpilers) are developed for automatic source-to-source translation. Although learning-based transpilers have shown impressive enhancement against rule-based counterparts, owing to their task-specific pre-training on extensive monolingual corpora. Their current performance still remains unsatisfactory for practical deployment, and the associated training resources are also prohibitively expensive. LLMs pre-trained on huge amounts of human-written code/text have shown remarkable performance in many code intelligence tasks due to their powerful generality, even without task-specific training. Thus, LLMs can potentially circumvent the above limitations, but they have not been exhaustively explored yet. This paper investigates diverse LLMs and learning-based transpilers for automated code translation tasks, finding that: although certain LLMs have outperformed current transpilers, they still have some accuracy issues, where most of the failures are induced by a lack of comprehension of source programs, missing clear instructions on I/O types in translation, and ignoring discrepancies between source and target programs. Enlightened by the above findings, we further propose UniTrans, a Unified code Translation framework, applicable to various LLMs, for unleashing their power in this field. Specifically, UniTrans first crafts a series of test cases for target programs with the assistance of source programs. Next, it harnesses the above auto-generated test cases to augment the code translation and then evaluate their correctness via execution. Afterward, UniTrans further (iteratively) repairs incorrectly translated programs prompted by test case execution results. Extensive experiments are conducted on six settings of translation datasets between Python, Java, and C++. Three recent LLMs of diverse sizes are tested with UniTrans, and all achieve substantial improvements.
翻译:代码翻译工具(转译器)旨在实现自动化的源到源翻译。尽管基于学习的转译器因在大量单语语料库上进行任务特定预训练,相比基于规则的方法展现出显著提升,但其当前性能在实际部署中仍不尽人意,且相关训练资源成本过高。在海量人工编写代码/文本上预训练的大型语言模型(LLMs)凭借其强大的通用性,即便无需任务特定训练,也能在许多代码智能任务中表现卓越。因此,LLMs有望规避上述局限,但尚未得到充分探索。本文系统研究了多种LLMs及基于学习的转译器在自动化代码翻译任务中的表现,发现:尽管某些LLMs已超越现有转译器,但仍存在准确性缺陷,其中多数失败源于对源代码理解不足、翻译中缺乏对输入/输出类型的明确指示,以及忽视源程序与目标程序间的差异。基于上述发现,我们进一步提出UniTrans——一种适用于多种LLMs的统一代码翻译框架,旨在释放其在该领域的潜力。具体而言,UniTrans首先借助源程序为目标程序生成一系列测试用例;其次,利用上述自动生成的测试用例增强代码翻译,并通过执行验证其正确性;然后,UniTrans进一步根据测试用例执行结果(迭代式)修复翻译错误的程序。在Python、Java和C++之间六组翻译数据集上开展了广泛实验,三种不同规模的最新LLMs在UniTrans框架下均实现了显著提升。