Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models and views) to support architecting activities, such as architecture recovery and understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 87 primary studies finally selected, 8 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 12 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 89 approaches and 54 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.
翻译:软件仓库挖掘已成为软件开发中的关键活动。近年来,挖掘架构信息(如架构模型与视图)以支持架构恢复、架构理解等架构活动,受到了学术界的广泛关注。然而,现有关于架构信息挖掘的文献缺乏系统性梳理,导致从业者难以理解并应用前沿研究成果(例如,应采用何种方法挖掘何种架构信息以支持具体架构活动),也阻碍研究者识别研究空白中的挑战与解决方案。本研究旨在从架构信息与信息来源、支持的架构活动、采用的方法与工具、面临的挑战四个方面,对软件仓库中架构信息挖掘的相关文献进行识别、分析与综合。我们针对2006年1月至2022年12月发表的文献开展了一项系统性映射研究。在最终筛选出的87篇主要研究中,共识别出8类被挖掘的架构信息,其中架构描述类信息被挖掘最为频繁;挖掘出的架构信息可支持12种架构活动,其中架构理解活动受支持程度最高;共有89种方法与54种工具被提出或用于架构信息挖掘;同时识别出4类挖掘挑战。本研究为研究者指明了有前景的未来方向,并帮助从业者了解可通过何种方法、工具挖掘何种来源中的架构信息,以支持各类架构活动。