Software architecture documentation is essential for system comprehension, yet it is often unavailable or incomplete. While recent LLM-based techniques can generate documentation from code, they typically address local artifacts rather than producing coherent, system-level architectural descriptions. This paper presents a structured process for automatically generating system-level architectural documentation directly from GitHub repositories using Large Language Models. The process, called CIAO (Code In Architecture Out), defines an LLM-based workflow that takes a repository as input and produces system-level architectural documentation following a template derived from ISO/IEC/IEEE 42010, SEI Views \& Beyond, and the C4 model. The resulting documentation can be directly added to the target repository. We evaluated the process through a study with 22 developers, each reviewing the documentation generated for a repository they had contributed to. The evaluation shows that developers generally perceive the produced documentation as valuable, comprehensible, and broadly accurate with respect to the source code, while also highlighting limitations in diagram quality, high-level context modeling, and deployment views. We also assessed the operational cost of the process, finding that generating a complete architectural document requires only a few minutes and is inexpensive to run. Overall, the results indicate that a structured, standards-oriented approach can effectively guide LLMs in producing system-level architectural documentation that is both usable and cost-effective.
翻译:软件架构文档对于系统理解至关重要,然而这类文档常常缺失或不完整。尽管近期基于LLM的技术能够从代码生成文档,但它们通常仅处理局部工件,而非生成连贯的系统级架构描述。本文提出了一种结构化流程,可直接从GitHub仓库利用大型语言模型自动生成系统级架构文档。该流程名为CIAO(Code In Architecture Out),定义了一个基于LLM的工作流——以仓库作为输入,按照衍生自ISO/IEC/IEEE 42010、SEI Views & Beyond以及C4模型的模板,输出系统级架构文档。生成的文档可直接添加至目标仓库。我们通过一项包含22名开发者的研究评估了该流程,每位开发者均审查了为其所贡献的仓库生成的文档。评估结果表明,开发者普遍认为生成的文档有价值、易于理解,且与源代码大致吻合;同时,评估也揭示了在图表质量、高层上下文建模及部署视图方面的局限性。我们还评估了该流程的运行成本,发现生成完整的架构文档仅需数分钟且运行成本低廉。总体而言,研究结果表明,一种结构化、面向标准的方法能够有效引导LLM生成既实用又具成本效益的系统级架构文档。