PairSmell: A Novel Perspective Inspecting Software Modular Structure

Enhancing the modular structure of existing systems has attracted substantial research interest, focusing on two main methods: (1) software modularization and (2) identifying design issues (e.g., smells) as refactoring opportunities. However, re-modularization solutions often require extensive modifications to the original modules, and the design issues identified are generally too coarse to guide refactoring strategies. Combining the above two methods, this paper introduces a novel concept, PairSmell, which exploits modularization to pinpoint design issues necessitating refactoring. We concentrate on a granular but fundamental aspect of modularity principles -- modular relation (MR), i.e., whether a pair of entities are separated or collocated. The main assumption is that, if the actual MR of a pair violates its `apt MR', i.e., an MR agreed on by multiple modularization tools (as raters), it can be deemed likely a flawed architectural decision that necessitates further examination. To quantify and evaluate PairSmell, we conduct an empirical study on 20 C/C++ and Java projects, using 4 established modularization tools to identify two forms of PairSmell: inapt separated pairs InSep and inapt collocated pairs InCol. Our study on 260,003 instances reveals that their architectural impacts are substantial: (1) on average, 14.60% and 20.44% of software entities are involved in InSep and InCol MRs respectively; (2) InSep pairs are associated with 190% more co-changes than properly separated pairs, while InCol pairs are associated with 35% fewer co-changes than properly collocated pairs, both indicating a successful identification of modular structures detrimental to software quality; and (3) both forms of PairSmell persist across software evolution.

翻译：提升现有系统的模块化结构已引起广泛的研究兴趣，主要聚焦于两种方法：(1) 软件模块化；(2) 识别作为重构机会的设计问题（例如代码异味）。然而，重新模块化的解决方案通常需要对原始模块进行大量修改，且所识别的设计问题往往过于粗略，难以指导重构策略。本文结合上述两种方法，提出了一种新颖的概念——PairSmell，它利用模块化技术来精确定位需要重构的设计问题。我们专注于模块化原则中一个细粒度但基础的方面——模块关系（MR），即一对实体是被分离还是被共置。我们的核心假设是：如果一对实体的实际MR违反了其“适宜MR”（即由多个模块化工具作为评估者共同认可的MR），则可将其视为可能存在缺陷的架构决策，需要进一步审查。为了量化和评估PairSmell，我们在20个C/C++和Java项目上进行了实证研究，使用4种成熟的模块化工具识别了两种形式的PairSmell：不适宜分离对（InSep）和不适宜共置对（InCol）。我们对260,003个实例的研究表明，它们对架构的影响是显著的：(1) 平均分别有14.60%和20.44%的软件实体涉及InSep和InCol的MR；(2) InSep对的协同变更次数比适宜分离对多190%，而InCol对的协同变更次数比适宜共置对少35%，这两者均表明我们成功识别了有损软件质量的模块化结构；(3) 两种形式的PairSmell在软件演化过程中持续存在。