Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
翻译:代码大语言模型(Code LLM)正蓬勃发展,每周都有新的强大模型发布,在代码生成任务上展现出卓越性能。为提升预训练代码大语言模型的代码生成能力,已有多种方法提出,包括监督微调、指令微调、强化学习等。本文提出了一种新颖的RRTF(即根据测试与教师反馈排序响应)框架,该框架能高效且有效地增强预训练大语言模型的代码生成能力。基于此框架,我们推出了PanGu-Coder2,其在OpenAI HumanEval基准测试中达到了62.20%的pass@1指标。此外,通过在CoderEval和LeetCode基准上的全面评估,我们证明PanGu-Coder2持续优于所有先前的代码大语言模型。