The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet effective framework CodeS, which decomposes NL2Repo into multiple sub-tasks by a multi-layer sketch. Specifically, CodeS includes three modules: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher first generates a repository's directory structure for given requirements; FileSketcher then generates a file sketch for each file in the generated structure; SketchFiller finally fills in the details for each function in the generated file sketch. To rigorously assess CodeS on the NL2Repo task, we carry out evaluations through both automated benchmarking and manual feedback analysis. For benchmark-based evaluation, we craft a repository-oriented benchmark, SketchEval, and design an evaluation metric, SketchBLEU. For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies. Extensive experiments prove the effectiveness and practicality of CodeS on the NL2Repo task.
翻译:大型语言模型(LLMs)在代码相关任务中的出色表现展示了全自动化软件开发的潜力。为此,我们提出了一项新的软件工程任务,即自然语言到代码仓库(NL2Repo)。该任务旨在根据自然语言需求生成完整的代码仓库。为解决此任务,我们提出了一种简单而有效的框架CodeS,该框架通过多层草图将NL2Repo分解为多个子任务。具体而言,CodeS包含三个模块:仓库草图生成器(RepoSketcher)、文件草图生成器(FileSketcher)和草图填充器(SketchFiller)。RepoSketcher首先根据给定需求生成仓库的目录结构;FileSketcher随后为生成结构中的每个文件生成文件草图;SketchFiller最终填补生成文件草图中每个函数的细节。为严格评估CodeS在NL2Repo任务上的表现,我们通过自动化基准测试和人工反馈分析进行了评估。对于基于基准的评估,我们构建了面向仓库的基准测试SketchEval,并设计了评估指标SketchBLEU。对于基于反馈的评估,我们为CodeS开发了VSCode插件,并招募了30名参与者进行实证研究。大量实验证明了CodeS在NL2Repo任务上的有效性和实用性。