Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. We show the utility of the trained model by demonstrating that it is competitive with the previous state-of-the-art on zero-shot Python code generation on HumanEval. We further investigate the multi-step paradigm for program synthesis, where a single program is factorized into multiple prompts specifying subproblems. To this end, we construct an open benchmark, Multi-Turn Programming Benchmark (MTPB), consisting of 115 diverse problem sets that are factorized into multi-turn prompts. Our analysis on MTPB shows that the same intent provided to CODEGEN in multi-turn fashion significantly improves program synthesis over that provided as a single turn. We make the training library JAXFORMER and model checkpoints available as open source contribution: https://github.com/salesforce/CodeGen.
翻译:程序合成旨在根据给定的问题规范(以输入输出示例或自然语言描述形式表达)生成计算机程序作为解决方案。大规模语言模型的普及推动了程序合成领域的最新技术发展,但有限的训练资源和数据阻碍了此类模型的开源访问。为促进该技术的民主化,我们基于自然语言和编程语言数据训练并发布了一系列规模达161亿参数的大语言模型(名为CODEGEN),同时开源了训练框架JAXFORMER。通过展示该模型在HumanEval零样本Python代码生成任务中与先前最优方法相当的性能,我们验证了其训练效用。进一步地,我们研究了程序合成的多步范式——将单个程序分解为指定子问题的多个提示。为此,我们构建了开放基准测试集——多轮编程基准(MTPB),包含115个多样化的、可分解为多轮提示的问题集。在MTPB上的分析表明,以多轮方式向CODEGEN提供相同意图时,其程序合成效果显著优于单轮输入。我们已将训练框架JAXFORMER和模型检查点作为开源贡献发布:https://github.com/salesforce/CodeGen。