Modern language models demonstrate impressive coding capabilities in common programming languages (PLs), such as C++ and Python, but their performance in lower-resource PLs is often limited by training data availability. In principle, however, most programming skills are universal across PLs, so the capability acquired in one PL should transfer to others. In this work, we propose the task of zero-shot cross-programming-language transfer for code RL. We find that, for Llama-3.1, RL training for code generation in a source PL fails to improve, and sometimes even degrades, the performance on other target PLs. To address this, we hypothesize that effective RL transfer requires a generalizable SFT initialization before RL. We thus propose **Parallel-SFT**, an SFT strategy that incorporates "parallel programs" -- functionally equivalent code implemented in multiple PLs -- into the data mixture. We demonstrate that this improves transferability: when we subsequently perform RL on our Parallel-SFT model, we observe better generalization to unseen PLs. Analysis of the model internal representations reveals that Parallel-SFT leads to a more functionality-centric latent space, where equivalent programs across PLs are more tightly clustered, which we hypothesize to contribute to the improved transferability.
翻译:现代语言模型在常见编程语言(如C++和Python)上展现出令人瞩目的编码能力,但在资源较少的编程语言上,其性能常受限于训练数据的可用性。然而原则上,大多数编程技能在不同编程语言间具有通用性,因此一种编程语言中习得的能力应能迁移至其他语言。本研究提出代码强化学习的零样本跨编程语言迁移任务。我们发现,对于Llama-3.1,在源编程语言上进行代码生成的强化学习训练,不仅未能提升目标编程语言的性能,有时甚至会导致性能下降。针对此问题,我们假设有效的强化学习迁移需要在强化学习之前建立可泛化的SFT初始化。为此提出**Parallel-SFT**策略,该SFT方法在数据混合中引入"并行程序"——即用多种编程语言实现的等价功能代码。实验证明该方法能提升迁移性:当我们在Parallel-SFT模型上后续进行强化学习时,观察到对未见编程语言的泛化能力显著增强。模型内部表征分析显示,Parallel-SFT构建了更注重功能性的潜在空间,其中不同编程语言间的等价程序聚类更紧密,我们推测这正是迁移性提升的关键因素。