Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLMs act as distinct ``experts'', each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other's work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three ChatGPT roles (i.e., analyst, coder, and tester) responsible for software development's analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to direct code generation, achieving state-of-the-art performance and even surpassing GPT-4. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex real-world tasks that are not readily solved by direct code generation, as evidenced in case study.
翻译:尽管大型语言模型(LLMs)已展现出卓越的代码生成能力,但在处理复杂任务时仍面临挑战。在现实软件开发中,人类通常通过协作团队应对复杂任务——这一策略能有效控制开发复杂度并提升软件质量。受此启发,我们提出了一种基于LLMs的自协作代码生成框架,以ChatGPT为例进行实证。具体而言,通过角色指令:1)多个LLMs扮演不同的"专家",各自负责复杂任务中的特定子任务;2)指定协作与交互方式,使不同角色形成虚拟团队以促进彼此工作,最终该虚拟团队无需人工干预即可协作完成代码生成任务。为有效组织和管理该虚拟团队,我们将软件开发方法论融入框架中。由此构建了包含三个ChatGPT角色(分析师、程序员和测试员)的基础团队,分别负责软件开发的"分析-编码-测试"阶段。我们在多个代码生成基准上进行了全面实验。结果表明,与直接代码生成相比,自协作代码生成的Pass@1指标相对提升29.9%-47.1%,达到了最先进的性能,甚至超越GPT-4。此外,案例研究表明,自协作有望使LLMs高效处理直接代码生成难以解决的复杂现实任务。