Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs. A crucial aspect of this process is understanding and preserving the high quality of the code dependency structure. While a variety of code structure models already exist, there is a surprising lack of models that closely represent the source code and focus on software comprehension. As a result, there are no readily available and easy-to-use tools to assist with dependency comprehension, refactoring, and quality monitoring of code. To address this gap, we propose the Semantic Code Graph (SCG), an information model that offers a detailed abstract representation of code dependencies with a close relationship to the source code. To validate the SCG model's usefulness in software comprehension, we compare it to nine other source code representation models. Additionally, we select 11 well-known and widely-used open-source projects developed in Java and Scala and perform a range of software comprehension activities on them using three different code representation models: the proposed SCG, the Call Graph (CG), and the Class Collaboration Network (CCN). We then qualitatively analyze the results to compare the performance of these models in terms of software comprehension capabilities. These activities encompass project structure comprehension, identifying critical project entities, interactive visualization of code dependencies, and uncovering code similarities through software mining. Our findings demonstrate that the SCG enhances software comprehension capabilities compared to the prevailing CCN and CG models. We believe that the work described is a step towards the next generation of tools that streamline code dependency comprehension and management.
翻译:软件理解因代码库规模的持续增长而变得极其耗时。因此,加速代码理解过程以促进代码维护并降低相关成本的需求日益迫切。这一过程中的关键环节在于理解并保持代码依赖结构的高质量。尽管目前已有多种代码结构模型,但令人惊讶的是,尚缺乏能紧密表征源代码并聚焦于软件理解的模型。这导致缺乏现成易用的工具来辅助依赖关系理解、重构及代码质量监控。为填补这一空白,我们提出语义代码图(Semantic Code Graph,SCG),一种提供代码依赖关系详细抽象表示并与源代码紧密关联的信息模型。为验证SCG模型在软件理解中的实用性,我们将其与其他九种源代码表示模型进行对比。此外,我们选取11个用Java和Scala开发的知名且广泛使用的开源项目,采用三种不同的代码表示模型(即所提SCG、调用图CG及类协作网络CCN)对其执行一系列软件理解活动。随后通过定性分析比较这些模型在软件理解能力方面的表现。这些活动涵盖项目结构理解、关键项目实体识别、代码依赖关系交互式可视化,以及通过软件挖掘发现代码相似性。研究结果表明,与主流的CCN和CG模型相比,SCG显著提升了软件理解能力。我们认为此项工作是迈向新一代简化代码依赖关系理解与管理工具的重要一步。