Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec, and addressing other code-specific issues and requirements. Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks. In this work, we propose a new approach to code generation by LLMs, which we call AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems. We tested AlphaCodium on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks. Full implementation is available at: https://github.com/Codium-ai/AlphaCodium

翻译：代码生成问题与常见的自然语言问题不同——它需要匹配目标语言的精确语法、识别正向路径和边界情况、关注问题规范中的大量细节，并解决其他代码特有的问题和要求。因此，许多在自然语言生成中成功的优化技巧和策略可能对代码任务并不有效。在这项工作中，我们提出了一种新的基于大语言模型的代码生成方法，称为AlphaCodium——一种基于测试、多阶段、面向代码的迭代流程，可提升LLM在代码问题上的性能。我们在名为CodeContests的具有挑战性的代码生成数据集上测试了AlphaCodium，该数据集包含来自Codeforces等平台的竞赛编程问题。所提出的流程始终且显著地提升了结果。例如，在验证集上，GPT-4的准确率（pass@5）从单一精心设计的直接提示的19%提升至AlphaCodium流程的44%。我们相信，本工作中获得的许多原则和最佳实践广泛适用于一般的代码生成任务。完整实现见：https://github.com/Codium-ai/AlphaCodium

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日