Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.
翻译:协助非专业用户开发复杂的交互式网站已成为基于大语言模型的代码智能体的一项热门任务。然而,现有的代码智能体往往仅生成前端网页,用华丽的视觉效果掩盖了其缺乏真实全栈数据处理与存储能力的问题。值得注意的是,构建生产级的全栈Web应用程序远比仅生成前端网页更具挑战性,它需要精细控制数据流、全面理解不断更新的软件包与依赖关系,并准确定位代码库中难以察觉的错误。为解决这些难题,我们提出了FullStack-Agent,一个用于全栈智能体式编码的统一智能体系统,它包含三个部分:(1) FullStack-Dev,一个具备强大规划、代码编辑、代码库导航和错误定位能力的多智能体框架。(2) FullStack-Learn,一种创新的数据扩展与自我提升方法,通过对爬取和合成的网站仓库进行回译来改进FullStack-Dev的骨干大语言模型。(3) FullStack-Bench,一个全面的基准测试,系统性地评估所生成网站的前端、后端和数据库功能。我们的FullStack-Dev在前端、后端和数据库测试用例上分别以8.7%、38.2%和15.9%的优势超越了先前的最优方法。此外,通过自我提升,FullStack-Learn将一个300亿参数模型在上述三组测试用例上的性能分别提升了9.7%、9.5%和2.8%,证明了我们方法的有效性。代码发布于 https://github.com/mnluzimu/FullStack-Agent。