The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting unique sequence-level characteristics of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that synergistically combines pre-trained PL models with Proximal Policy Optimization (PPO) which is a widely used deep reinforcement learning technique. By utilizing non-differentiable feedback from code execution and structure alignment, PPOCoder seamlessly integrates external code-specific knowledge into the model optimization process. It's important to note that PPOCoder is a task-agnostic and model-agnostic framework that can be used across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, achieving significant improvements in compilation success rates and functional correctness across different PLs.
翻译:利用在大规模代码语料库上预训练的编程语言模型来自动化软件工程过程,在代码补全、代码转换和程序合成等代码生成任务中已展现出显著潜力。然而,当前方法主要依赖从文本生成领域借鉴的监督微调目标,忽略了代码特有的序列级特性,包括但不限于可编译性以及语法和功能正确性。为解决这一局限,我们提出PPOCoder——一种新颖的代码生成框架,该框架将预训练编程语言模型与近端策略优化(一种广泛使用的深度强化学习技术)协同结合。通过利用代码执行和结构对齐产生的不可微分反馈,PPOCoder将代码特定的外部知识无缝集成到模型优化过程中。值得注意的是,PPOCoder是一个任务无关且模型无关的框架,可应用于不同的代码生成任务和编程语言。在三个代码生成任务上的广泛实验表明,与最先进方法相比,我们提出的方法在不同编程语言中均取得了编译成功率和功能正确性的显著提升。