In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Current LLM-based code translation methods lack a training approach to ensure that the translated code reliably compiles or bears substantial functional equivalence to the input code. In our work, we train an LLM via reinforcement learning, by modifying the fine-tuning process to incorporate compiler feedback and symbolic execution (symexec)-based equivalence testing feedback that checks for functional equivalence between the input and output programs. The idea is to guide an LLM-in-training, via compiler and symexec-based testing feedback, by letting it know how far it is from producing perfect translations. We report on extensive experiments comparing CoTran with 14 other code translation tools that include human-written transpilers, LLM-based translation tools, and ChatGPT over a benchmark of more than 57,000 Java-Python equivalent pairs, and we show that CoTran outperforms them on relevant metrics such as compilation accuracy (CompAcc) and functional equivalence accuracy (FEqAcc). For example, our tool achieves 48.68% FEqAcc, 76.98% CompAcc for Python-to-Java translation, whereas the nearest competing tool (PLBART-base) only gets 38.26% and 75.77% resp. Also, built upon CodeT5, CoTran achieves +11.23%, +14.89% improvement on FEqAcc and +4.07%, +8.14% on CompAcc for Java-to-Python and Python-to-Java translation resp.
翻译:本文提出了一种基于大语言模型(LLM)的代码翻译方法及其配套工具CoTran,可将完整程序从一种高级编程语言翻译为另一种。当前基于LLM的代码翻译方法缺乏确保翻译代码可靠编译或与输入代码保持实质性功能等价性的训练机制。我们通过修改微调过程,融合编译器反馈和基于符号执行(symexec)的等价性测试反馈(用于检验输入与输出程序间的功能等价性),采用强化学习方法训练LLM。其核心思想是通过编译器与符号执行测试反馈,向训练中的LLM传递其与生成完美翻译之间的距离信息。我们开展了大量实验,将CoTran与14种其他代码翻译工具(包括人工编写的转译器、基于LLM的翻译工具及ChatGPT)在超过57,000个Java-Python等价程序对基准集上进行对比,结果表明CoTran在编译准确率(CompAcc)和功能等价准确率(FEqAcc)等关键指标上均优于对比工具。例如,我们的工具在Python到Java翻译中达到48.68%的FEqAcc和76.98%的CompAcc,而性能最接近的对比工具(PLBART-base)分别仅为38.26%和75.77%。此外,基于CodeT5构建的CoTran在Java到Python和Python到Java翻译任务中,FEqAcc分别提升+11.23%和+14.89%,CompAcc分别提升+4.07%和+8.14%。