Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs. A crucial aspect of this process is understanding and preserving the high quality of the code dependency structure. While a variety of code structure models already exist, there is a surprising lack of models that closely represent the source code and focus on software comprehension. As a result, there are no readily available and easy-to-use tools to assist with dependency comprehension, refactoring, and quality monitoring of code. To address this gap, we propose the Semantic Code Graph (SCG), an information model that offers a detailed abstract representation of code dependencies with a close relationship to the source code. To validate the SCG model's usefulness in software comprehension, we compare it to nine other source code representation models. Additionally, we select 11 well-known and widely-used open-source projects developed in Java and Scala and perform a range of software comprehension activities on them using three different code representation models: the proposed SCG, the Call Graph (CG), and the Class Collaboration Network (CCN). We then qualitatively analyze the results to compare the performance of these models in terms of software comprehension capabilities. These activities encompass project structure comprehension, identifying critical project entities, interactive visualization of code dependencies, and uncovering code similarities through software mining. Our findings demonstrate that the SCG enhances software comprehension capabilities compared to the prevailing CCN and CG models. We believe that the work described is a step towards the next generation of tools that streamline code dependency comprehension and management.
翻译:软件理解可能极为耗时,因为代码库的规模不断增长。因此,加速代码理解过程以促进维护并降低相关成本的需求日益迫切。该过程的一个关键方面是理解并保持代码依赖结构的高质量。尽管已有多种代码结构模型,但令人惊讶的是,缺乏密切表示源代码并专注于软件理解的模型。因此,目前没有现成且易于使用的工具来辅助依赖理解、重构和代码质量监控。为解决这一空白,我们提出了语义代码图(Semantic Code Graph,SCG),这是一种信息模型,能够提供与源代码密切对应的详细抽象代码依赖表示。为了验证SCG模型在软件理解中的实用性,我们将其与其他九种源代码表示模型进行了比较。此外,我们选取了11个用Java和Scala开发的知名且广泛使用的开源项目,并使用三种不同的代码表示模型(即提出的SCG、调用图(Call Graph,CG)和类协作网络(Class Collaboration Network,CCN))对其执行一系列软件理解活动。然后,我们定性分析结果,以比较这些模型在软件理解能力方面的表现。这些活动包括:项目结构理解、识别关键项目实体、代码依赖的交互式可视化以及通过软件挖掘发现代码相似性。我们的研究结果表明,与主流的CCN和CG模型相比,SCG增强了软件理解能力。我们相信,本文所述的工作是迈向简化代码依赖理解和管理的新一代工具的一步。