While large language models (LLMs) have greatly advanced the functional correctness of automated code translation systems, the runtime efficiency of translated programs has received comparatively little attention. With the waning of Moore's law, runtime efficiency has become increasingly important for program quality, alongside functional correctness. Our preliminary study reveals that LLM-translated programs often run slower than human-written ones, and this issue cannot be remedied through prompt engineering alone. Therefore, our work proposes SwiftTrans, a code translation framework comprising two key stages: (1) Multi-Perspective Exploration, where MpTranslator leverages parallel in-context learning (ICL) to generate diverse translation candidates; and (2) Difference-Aware Selection, where DiffSelector identifies the optimal candidate by explicitly comparing differences between translations. We further introduce Hierarchical Guidance for MpTranslator and Ordinal Guidance for DiffSelector, enabling LLMs to better adapt to these two core components. To support the evaluation of runtime efficiency in translated programs, we extend existing benchmarks, CodeNet and F2SBench, and introduce a new benchmark, SwiftBench. Experimental results across all three benchmarks show that SwiftTrans achieves consistent improvements in both correctness and runtime efficiency.
翻译:尽管大语言模型(LLMs)显著提升了自动化代码翻译系统的功能正确性,但翻译后程序的运行时效率却受到的关注相对较少。随着摩尔定律的放缓,运行时效率与功能正确性一样,已成为程序质量的关键因素。我们的初步研究表明,大语言模型翻译的程序往往比人工编写的程序运行更慢,且此问题无法仅通过提示工程解决。为此,本文提出SwiftTrans代码翻译框架,包含两个关键阶段:(1)多视角探索阶段——MpTranslator利用并行上下文学习(ICL)生成多样化的翻译候选;(2)差异感知选择阶段——DiffSelector通过显式比较翻译间差异来识别最优候选。我们进一步为MpTranslator引入层级引导机制,为DiffSelector引入序数引导机制,使大语言模型能更好地适配这两个核心组件。为支持翻译程序运行时效率的评估,我们扩展了现有基准数据集CodeNet和F2SBench,并引入新基准数据集SwiftBench。在三个基准数据集上的实验结果表明,SwiftTrans在正确性和运行时效率方面均取得持续提升。