CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents

Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) are reshaping the field of Software Engineering (SE). Existing LLM-based multi-agent systems have successfully resolved simple dialogue tasks. However, the potential of LLMs for more complex tasks, such as automated code generation for large and complex projects, have been explored in only a few existing works. This paper introduces CodePori, a novel model designed to automate code generation for extensive and complex software projects based on natural language prompts. We employ LLM-based multi-AI agents to handle creative and challenging tasks in autonomous software development. Each agent engages with a specific task, including system design, code development, code review, code verification, and test engineering. We show in the paper that CodePori is able to generate running code for large-scale projects, completing the entire software development process in minutes rather than hours, and at a cost of a few dollars. It identifies and mitigates potential security vulnerabilities and corrects errors while maintaining a solid code performance level. We also conducted an evaluation of CodePori against existing solutions using HumanEval and the Massively Multitask Benchmark for Python (MBPP) benchmark. The results indicate that CodePori improves upon the benchmarks in terms of code accuracy, efficiency, and overall performance. For example, CodePori improves the pass@1 metric on HumanEval to 87.5% and on MBPP to 86.5%, representing a clear improvement over the existing models. We also assessed CodePori's performance through practitioner evaluations, with 91% expressing satisfaction with the model's performance.

翻译：大型语言模型（LLMs）与生成式预训练Transformer（GPTs）正重塑软件工程（SE）领域。现有基于LLM的多智能体系统已成功解决简单对话任务，但仅有少数研究探索了LLM在复杂任务（如大型项目自动化代码生成）中的潜力。本文提出CodePori——一种新型模型，旨在基于自然语言提示自动生成大型复杂软件项目的代码。我们采用基于LLM的多AI智能体处理自主软件开发中的创造性与挑战性任务。每个智能体专司特定职能，包括系统设计、代码开发、代码审查、代码验证及测试工程。本文表明，CodePori能够为大型项目生成可运行代码，在数分钟内而非数小时内完成整个软件开发流程，且成本仅需数美元。它在保持稳健代码性能的同时，识别并缓解潜在安全漏洞、修正错误。我们还利用HumanEval与Python多任务基准测试（MBPP）对CodePori与现有方案进行了对比评估。结果表明，CodePori在代码准确性、效率与整体性能上均优于基准。例如，CodePori将HumanEval的pass@1指标提升至87.5%，MBPP提升至86.5%，较现有模型实现了显著改进。此外，通过从业者评估验证CodePori性能，91%的参与者对该模型表现表示满意。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日