Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. An SMS has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.
翻译:软件仓库挖掘已成为软件开发中的关键活动。近年来,挖掘架构信息以支持架构理解等架构活动备受关注。然而,当前关于架构信息挖掘的文献缺乏清晰性。这可能导致从业者难以理解并采用最新研究成果,例如应采取何种方法挖掘哪些架构信息以支持架构活动,同时也会阻碍研究者识别现有研究空白中的挑战与应对策略。本研究旨在从被挖掘的架构信息及其来源、所支持的架构活动、所采用的方法与工具、以及面临的挑战四个维度,对架构信息挖掘相关文献进行识别、分析与综合。我们对2006年1月至2022年12月期间发表的文献进行了系统映射研究。在选定的104篇主要研究中,共挖掘出7类架构信息,其中架构描述是被挖掘最多的架构信息类型;11类信息源被用于架构信息挖掘,其中版本控制系统是最常用的信息源;被挖掘的架构信息可支持11类架构活动,其中架构理解是支持度最高的活动;共提出并使用了95种方法与56种工具;识别出4类架构信息挖掘中的挑战。本系统映射研究为研究者提供了未来研究方向,并帮助从业者了解可采用何种方法与工具从何种信息源中挖掘哪些架构信息,以支持各类架构活动。