Context: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering (SE). Existing LLM-based multi-agent models have successfully addressed basic dialogue tasks. However, the potential of LLMs for more challenging tasks, such as automated code generation for large and complex projects, has been investigated in only a few existing works. Objective: This paper aims to investigate the potential of LLM-based agents in the software industry, particularly in enhancing productivity and reducing time-to-market for complex software solutions. Our primary objective is to gain insights into how these agents can fundamentally transform the development of large-scale software. Methods: We introduce CodePori, a novel system designed to automate code generation for large and complex software projects based on functional and non-functional requirements defined by stakeholders. To assess the proposed system performance, we utilized the HumanEval benchmark and manually tested the CodePori model, providing 20 different project descriptions as input and then evaluated the code accuracy by manually executing the code. Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process. The HumanEval benchmark results indicate that CodePori improves code accuracy by 89%. A manual assessment conducted by the first author shows that the CodePori system achieved an accuracy rate of 85%. Conclusion: Based on the results, our conclusion is that proposed system demonstrates the transformative potential of LLM-based agents in SE, highlighting their practical applications and opening new opportunities for broader adoption in both industry and academia. Our project is publicly available at https://github.com/GPT-Laboratory/CodePori.
翻译:背景:大型语言模型(LLMs)与生成式预训练Transformer(GPTs)已彻底改变软件工程(SE)领域。现有基于LLM的多智能体模型已成功处理基础对话任务,但LLM在更具挑战性任务(如大型复杂项目的自动化代码生成)中的潜力仅在少数现有研究中得到探索。目标:本文旨在探究基于LLM的智能体在软件行业中的潜力,特别是在提升复杂软件解决方案的生产力与缩短上市周期方面。我们的核心目标是深入理解这些智能体如何从根本上变革大规模软件开发。方法:我们提出CodePori——一种基于利益相关者定义的功能与非功能需求,为大型复杂软件项目实现自动化代码生成的新型系统。为评估所提系统性能,我们采用HumanEval基准测试并手动测试CodePori模型:输入20个不同项目描述后,通过人工执行代码来评估代码准确性。结果:CodePori能够为符合典型软件开发流程的大规模项目生成可运行代码。HumanEval基准测试结果表明CodePori将代码准确率提升89%。第一作者开展的人工评估显示CodePori系统达到85%的准确率。结论:基于实验结果,我们认为所提系统展现了基于LLM的智能体在SE领域的变革潜力,凸显了其实际应用价值,并为工业界与学术界的广泛采用开辟了新机遇。本项目公开于https://github.com/GPT-Laboratory/CodePori。