Code translation transforms code between programming languages while preserving functionality, which is critical in software development and maintenance. While traditional learning-based code translation methods have limited effectiveness due to the lack of sufficient parallel training data, Large Language Models (LLMs) have recently advanced this field with their strong code generation and comprehension capabilities. However, code translated by LLMs still suffers from diverse quality issues, such as syntax and semantic errors. In this work, we propose TransAGENT, a novel multi-agent system that eliminates the errors during LLM-based code translation. The main insight of TransAGENT is to localize error-prone code blocks via fine-grained execution alignment between source and target code. We evaluate TransAGENT on a newly constructed benchmark of recent programming tasks to mitigate data leakage. TransAGENT outperforms the latest UniTrans by up to 33.3% in translation accuracy and achieves an average improvement of 56.7% over Agentless in program repair performance. We also conduct an ablation study and evaluate TransAGENT across different LLMs, demonstrating its effectiveness and strong generalizability.
翻译:代码翻译旨在保持功能一致的前提下将代码在不同编程语言间转换,这是软件开发与维护中的关键任务。传统基于学习的代码翻译方法因缺乏充足的并行训练数据而效果有限,而大语言模型凭借其强大的代码生成与理解能力近年来推动了该领域的发展。然而,大语言模型翻译的代码仍存在语法错误、语义错误等各类质量问题。本文提出TransAGENT——一种新型多智能体系统,可消除基于大语言模型代码翻译中的错误。TransAGENT的核心思路是通过源代码与目标代码间的细粒度执行对齐来定位易出错的代码块。我们在新构建的近期编程任务基准上评估TransAGENT以规避数据泄露问题,其翻译准确率较最新UniTrans方法提升高达33.3%,在程序修复性能上相较Agentless方法平均提升56.7%。我们同时开展了消融实验并在不同大语言模型上评估TransAGENT,验证了其有效性与强泛化能力。