The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder achieves 77.4% and 89.1% pass@1 in HumanEval-ET and MBPP-ET with GPT-3.5, while SOTA baselines obtain only 69.5% and 63.0%.
翻译:自然语言处理技术的进步因基于Transformer的大语言模型的发展而显著加速。这些模型彻底革新了自然语言处理任务,特别是在代码生成领域,帮助开发者更高效地创建软件。尽管取得了进展,但在平衡代码片段生成与有效测试用例生成及执行方面仍存在挑战。针对这些问题,本文提出多智能体辅助代码生成(AgentCoder),一种包含程序员智能体、测试设计智能体和测试执行智能体的多智能体框架创新解决方案。在编码过程中,程序员智能体专注于根据测试执行智能体的反馈进行代码生成与优化。测试设计智能体为生成的代码设计测试用例,测试执行智能体运行代码并生成反馈信息传递给程序员。这种协作系统确保了稳健的代码生成能力,突破了单智能体模型和传统方法的局限。我们在9个代码生成模型和12种增强方法上的大量实验表明,AgentCoder在多个基准测试中展现出优于现有代码生成模型和提示工程技术的性能。例如,在HumanEval-ET和MBPP-ET基准测试中,采用GPT-3.5的AgentCoder分别实现了77.4%和89.1%的pass@1指标,而现有最优基线方法仅达到69.5%和63.0%。