Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

Software architecture design is a critical yet inherently complex and knowledge-intensive phase that requires balancing competing quality attributes and adapting to evolving requirements. Traditionally, this process has been time-consuming, labor-intensive, and heavily reliant on architects, often resulting in limited exploration of alternative architectural decompositions and styles, especially under the pressures of agile development. While LLM-based agents have shown promising performance across various software engineering tasks, their application to architecture design remains relatively scarce and requires systematic exploration. To address these challenges, we proposed MAAD (Multi-Agent Architecture Design), a knowledge-driven framework that orchestrates four specialized agents (i.e., Analyst, Modeler, Designer and Evaluator) to autonomously and collaboratively transform requirements specifications into comprehensive, multi-view architectural blueprints with quality attribute assessments. MAAD incorporates RAG to inject recognized architectural standards and patterns into the workflow and leverages a hierarchical memory mechanism that captures design history for iterative refinement. We evaluated MAAD through comparative experiments against MetaGPT, using quantitative architecture-level metrics across 10 case studies and qualitative feedback from industry architects on 10 real-world specifications. Results show that MAAD generates more complete, modular, and traceable architectures than the baseline, and its dedicated Evaluator agent autonomously produces structured quality evaluation reports that significantly reduce manual validation efforts. Furthermore, we found that the quality of the generated architecture heavily depends on the underlying LLM's reasoning capacity, with GPT-5.2 and Qwen3.5 outperforming other models across most evaluation settings.

翻译：软件架构设计是一个关键但本质上复杂且知识密集的环节，需要平衡相互竞争的质量属性并适应不断变化的需求。传统上，这一过程耗时、劳动密集且高度依赖架构师，尤其在敏捷开发的压力下，往往导致对备选架构分解与风格的探索有限。尽管基于大语言模型（LLM）的智能体已在各类软件工程任务中展现出良好性能，但其在架构设计中的应用仍相对稀少，亟需系统性探索。为解决这些挑战，我们提出了MAAD（多智能体架构设计）——一种知识驱动的框架，它编排四个专精智能体（即分析师、建模师、设计者与评估者），以自主协作方式将需求规格说明转化为附带质量属性评估的综合多视角架构蓝图。MAAD引入RAG（检索增强生成）将公认的架构标准与模式注入工作流程，并利用层级记忆机制捕获设计历史以实现迭代优化。我们通过对比实验对MAAD进行评估：基于10个案例研究，使用定量架构级指标将其与MetaGPT比较，并收集了工业界架构师针对10项真实世界规格说明的定性反馈。结果表明，相比基线方法，MAAD生成的架构更完整、模块化且可追溯；其专用评估智能体可自主生成结构化的质量评估报告，显著减少人工验证工作量。此外，我们发现生成架构的质量高度依赖于底层LLM的推理能力，在多数评估设置中，GPT-5.2与Qwen3.5的性能优于其他模型。