Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is insufficient for tracing high-level features to the methods that collaboratively implement them. To address these limitations, we propose RepoSummary, a feature-oriented code repository summarization approach that simultaneously generates repository documentation automatically. Furthermore, it establishes more accurate traceability links from functional features to the corresponding code elements, enabling developers to rapidly locate relevant methods and files during code comprehension and maintenance. Comprehensive experiments against the state-of-the-art baseline (HGEN) demonstrate that RepoSummary achieves higher feature coverage and more accurate traceability. On average, it increases the rate of completely covered features in manual documentation from 61.2% to 71.1%, improves file-level traceability recall from 29.9% to 53.0%, and generates documentation that is more conceptually consistent, easier to understand, and better formatted than that produced by existing approaches.
翻译:仓库摘要生成是软件工程开发与维护中的一个关键研究问题。现有的仓库摘要技术主要依据目录树结构对代码进行总结,难以将高层级功能特征追踪至协同实现这些功能的具体方法。为克服这些局限性,本文提出RepoSummary,一种面向特征的代码仓库摘要方法,该方法能同时自动生成仓库文档。此外,它建立了从功能特征到对应代码元素更精确的可追踪链接,使开发者在代码理解与维护过程中能快速定位相关方法与文件。针对当前最先进的基线方法(HGEN)的全面实验表明,RepoSummary实现了更高的特征覆盖率和更准确的可追踪性。平均而言,它将人工文档中完全覆盖的特征比例从61.2%提升至71.1%,将文件级可追踪性召回率从29.9%提高至53.0%,且生成的文档在概念一致性、可理解性与格式规范性方面均优于现有方法。