We present app.build (https://github.com/neondatabase/appdotbuild-agent), an open-source framework that improves LLM-based application generation through systematic validation and structured environments. Our approach combines multi-layered validation pipelines, stack-specific orchestration, and model-agnostic architecture, implemented across three reference stacks. Through evaluation on 30 generation tasks, we demonstrate that comprehensive validation achieves 73.3% viability rate with 30% reaching perfect quality scores, while open-weights models achieve 80.8% of closed-model performance when provided structured environments. The open-source framework has been adopted by the community, with over 3,000 applications generated to date. This work demonstrates that scaling reliable AI agents requires scaling environments, not just models -- providing empirical insights and complete reference implementations for production-oriented agent systems.
翻译:我们提出app.build(https://github.com/neondatabase/appdotbuild-agent),这是一个通过系统性验证与结构化环境来改进基于大语言模型的应用生成的开源框架。我们的方法结合了多层验证流水线、技术栈专用编排机制与模型无关架构,并在三个参考技术栈中实现。通过对30项生成任务的评估,我们证明全面验证可实现73.3%的可用率,其中30%达到完美质量评分;当提供结构化环境时,开源权重模型能达到闭源模型80.8%的性能。该开源框架已被社区采纳,迄今已生成超过3,000个应用。本研究表明,扩展可靠人工智能智能体需要扩展环境而不仅是模型——为面向生产的智能体系统提供了实证见解与完整的参考实现。