CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

Transpilation, or code translation, aims to convert source code from one programming language (PL) to another. It is beneficial for many downstream applications, from modernizing large legacy codebases to augmenting data for low-resource PLs. Recent large language model (LLM)-based approaches have demonstrated immense potential for code translation. Among these approaches, training-based methods are particularly important because LLMs currently do not effectively adapt to domain-specific settings that suffer from a lack of knowledge without targeted training. This limitation is evident in transpilation tasks involving low-resource PLs. However, existing training-based approaches rely on a pairwise transpilation paradigm, making it impractical to support a diverse range of PLs. This limitation is particularly prominent for low-resource PLs due to a scarcity of training data. Furthermore, these methods suffer from suboptimal reinforcement learning (RL) reward formulations. To address these limitations, we propose CodePivot, a training framework that leverages Python as an intermediate representation (IR), augmented by a novel RL reward mechanism, Aggressive-Partial-Functional reward, to bootstrap the model's multilingual transpilation ability without requiring parallel corpora. Experiments involving 10 PLs show that the resulting 7B model, trained on Python-to-Others tasks, consistently improves performance across both general and low-resource PL-related transpilation tasks. It outperforms substantially larger mainstream models with hundreds of billions more parameters, such as Deepseek-R1 and Qwen3-235B-A22B-Instruct-2507, on Python-to-Others tasks and Others-to-All tasks, respectively. In addition, it outperforms its counterpart trained directly on Any-to-Any tasks on general transpilation tasks. The code and data are available at https://github.com/lishangyu-hkust/CodePivot.

翻译：代码翻译旨在将源代码从一种编程语言转换为另一种编程语言。它在多种下游应用中具有重要价值，从现代化大型遗留代码库到为资源匮乏的编程语言扩充数据。近期基于大语言模型的方法在代码翻译中展现出巨大潜力。在这些方法中，基于训练的方法尤为重要，因为大语言模型目前无法有效适应缺乏针对训练知识的领域特定设置——这一局限性在涉及低资源编程语言的翻译任务中尤为明显。然而，现有基于训练的方法依赖成对翻译范式，难以支持多样的编程语言。这种局限性在低资源编程语言中尤为突出，原因是缺乏训练数据。此外，这些方法还存在强化学习奖励公式设计欠佳的问题。为克服这些局限，我们提出CodePivot——一种以Python为中间表示的训练框架，并通过新型强化学习奖励机制（激进-部分-功能奖励）增强，使其能在无需平行语料的情况下引导模型的多语言翻译能力。涉及10种编程语言的实验表明，在Python到其他语言任务上训练得到的7B模型，在通用和低资源编程语言相关的翻译任务中均实现了一致性的性能提升。该模型在Python到其他语言任务及其他语言到所有语言任务上，分别超越了参数规模达数千亿的主流大模型，例如Deepseek-R1和Qwen3-235B-A22B-Instruct-2507。此外，在通用翻译任务上，其性能也优于直接进行任意语言间训练的同级模型。代码与数据已开源：https://github.com/lishangyu-hkust/CodePivot。