Given a large and evolving codebase, the ability to automatically generate holistic, architecture-aware documentation that captures not only individual functions but also cross-file, cross-module, and system-level interactions remains an open challenge. Comprehensive documentation is essential for long-term software maintenance and collaboration, yet current automated approaches still fail to model the rich semantic dependencies and architectural structures that define real-world software systems. We present \textbf{CodeWiki}, a unified framework for automated repository-level documentation across seven programming languages. CodeWiki introduces three key innovations: (i) hierarchical decomposition that preserves architectural context across multiple levels of granularity, (ii) recursive multi-agent processing with dynamic task delegation for scalable generation, and (iii) multi-modal synthesis that integrates textual descriptions with visual artifacts such as architecture diagrams and data-flow representations. To enable rigorous evaluation, we introduce \textbf{CodeWikiBench}, a comprehensive benchmark featuring multi-dimensional rubrics and LLM-based assessment protocols. Experimental results show that CodeWiki achieves a 68.79\% quality score with proprietary models, outperforming the closed-source DeepWiki baseline (64.06\%) by 4.73\%, with particularly strong improvements on high-level scripting languages (+10.47\%). We open-source CodeWiki to foster future research and community adoption.
翻译:给定一个庞大且不断演化的代码库,自动生成能够捕捉不仅限于单个函数、还包括跨文件、跨模块及系统级交互的整体性、架构感知文档的能力,仍然是一个悬而未决的挑战。全面的文档对于长期的软件维护与协作至关重要,然而当前的自动化方法仍未能有效建模定义现实世界软件系统的丰富语义依赖与架构结构。我们提出\textbf{CodeWiki},一个跨七种编程语言的自动化仓库级文档生成统一框架。CodeWiki引入了三项关键创新:(i) 在多个粒度级别上保持架构上下文的分层分解,(ii) 具备动态任务委派能力、可实现可扩展生成的递归多智能体处理,以及(iii) 集成文本描述与架构图、数据流表示等视觉产物的多模态合成。为了支持严谨的评估,我们引入了\textbf{CodeWikiBench},这是一个包含多维评估量规和基于LLM评估协议的综合基准。实验结果表明,CodeWiki在使用专有模型时获得了68.79\%的质量分数,优于闭源的DeepWiki基线(64.06\%)4.73\%,尤其是在高级脚本语言上表现出显著的提升(+10.47\%)。我们开源CodeWiki以促进未来研究和社区采用。