ARC: Compiling Hundreds of Requirement Scenarios into A Runnable Web System

Large Language Models (LLMs) have improved programming efficiency, but their performance degrades significantly as requirements scale; when faced with multi-modal documents containing hundreds of scenarios, LLMs often produce incorrect implementations or omit constraints. We propose Agentic Requirement Compilation (ARC), a technique that moves beyond simple code generation to requirement compilation, enabling the creation of runnable web systems directly from multi-modal DSL documents. ARC generates not only source code but also modular designs for UI, API, and database layers, enriched test suites (unit, modular, and integration), and detailed traceability for software maintenance. Our approach employs a bidirectional test-driven agentic loop: a top-down architecture phase decomposes requirements into verifiable interfaces, followed by a bottom-up implementation phase where agents generate code to satisfy those tests. ARC maintains strict traceability across requirements, design, and code to facilitate intelligent asset reuse. We evaluated ARC by generating six runnable web systems from documents spanning 50-200 multi-modal scenarios. Compared to state-of-the-art baselines, ARC-generated systems pass 50.6% more GUI tests on average. A user study with 21 participants showed that novice users can successfully write DSL documents for complex systems, such as a 10K-line ticket-booking system, in an average of 5.6 hours. These results demonstrate that ARC effectively transforms non-trivial requirement specifications into maintainable, runnable software.

翻译：大型语言模型（LLMs）虽已提升编程效率，但其性能会随需求规模扩大而显著下降；当面对包含数百个场景的多模态文档时，LLMs常产生错误实现或遗漏约束条件。我们提出Agentic Requirement Compilation（ARC）技术，该方法超越简单的代码生成，转向需求编译，能够直接从多模态DSL文档创建可运行的Web系统。ARC不仅生成源代码，还生成UI、API和数据库层的模块化设计、增强的测试套件（单元测试、模块测试与集成测试），以及支持软件维护的详细可追溯性。我们的方法采用双向测试驱动的智能体循环：自上而下的架构阶段将需求分解为可验证的接口，随后自下而上的实现阶段通过智能体生成满足测试的代码。ARC在需求、设计与代码间保持严格的可追溯性，以促进智能资产复用。我们通过从包含50-200个多模态场景的文档生成六个可运行Web系统来评估ARC。与最先进的基线方法相比，ARC生成的系统平均多通过50.6%的GUI测试。一项涉及21名参与者的用户研究表明，新手用户平均仅需5.6小时即可为复杂系统（如万行代码级的票务预订系统）成功编写DSL文档。这些结果表明，ARC能有效将非平凡的需求规约转化为可维护、可运行的软件。