Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct `experts', each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other's work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development's analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.
翻译:尽管大型语言模型(LLMs)已展现出卓越的代码生成能力,但在处理复杂任务时仍面临挑战。在现实软件开发中,人类通常通过协作团队来攻克复杂任务,这种策略能显著控制开发复杂度并提升软件质量。受此启发,我们提出了一种基于LLMs(以ChatGPT为例)的自协作代码生成框架。具体而言,通过角色指令:1)多个LLM智能体分别扮演不同的"专家",各自负责复杂任务中的特定子任务;2)明确协作与交互方式,使不同角色形成虚拟团队以促进彼此工作,最终该虚拟团队无需人工干预即可协作完成代码生成任务。为有效组织和管理该虚拟团队,我们将软件开发方法融入框架中。由此组建了包含三个LLM角色(分析师、编码员和测试员)的基础团队,分别负责软件开发的"分析-编码-测试"阶段。我们在多个代码生成基准上开展了全面实验。实验结果表明,与基础LLM智能体相比,自协作代码生成的Pass@1指标相对提升29.9%-47.1%。此外,我们证明自协作可能使LLMs高效处理单一LLM智能体难以解决的复杂仓库级任务。