Code smell is a great challenge in software refactoring, which indicates latent design or implementation flaws that may degrade the software maintainability and evolution. Over the past decades, a variety of refactoring approaches have been proposed, which can be broadly classified into metrics-based, rule-based, and machine learning-based approaches. Recent years, deep learning-based approaches have also attracted widespread attention. However, existing techniques exhibit various limitations. Metrics- and rule-based approaches rely heavily on manually defined heuristics and thresholds, whereas deep learning-based approaches are often constrained by dataset availability and model design. In this study, we proposed a graph-based deep learning approach for code smell refactoring. Specifically, we designed two types of input graphs (class-level and method-level) and employed both graph classification and node classification tasks to address the refactoring of three representative code smells: long method, large class, and feature envy. In our experiment, we propose a semi-automated dataset generation approach that could generate a large-scale dataset with minimal manual effort. We implemented the proposed approach with three classical GNN (graph neural network) architectures: GCN, GraphSAGE, and GAT, and evaluated its performance against both traditional and state-of-the-art deep learning approaches. The results demonstrate that proposed approach achieves superior refactoring performance.
翻译:代码异味是软件重构中的重大挑战,它暗示着可能降低软件可维护性与演化性的潜在设计或实现缺陷。过去数十年来,研究者提出了多种重构方法,可大致分为基于度量、基于规则和基于机器学习的方法。近年来,基于深度学习的方法也引起了广泛关注。然而,现有技术存在诸多局限性:基于度量和规则的方法严重依赖人工定义的启发式规则与阈值,而基于深度学习的方法常受限于数据集可用性与模型设计。本研究提出了一种基于图的深度学习方法用于代码异味重构。具体而言,我们设计了两类输入图(类级别与方法级别),并采用图分类与节点分类任务来处理三种代表性代码异味的重构:长方法、大类与特性依恋。实验中,我们提出了一种半自动化数据集生成方法,能够以最小人工成本生成大规模数据集。我们使用三种经典图神经网络架构(GCN、GraphSAGE与GAT)实现了所提方法,并对比传统方法与前沿深度学习方法进行了性能评估。结果表明,所提方法实现了卓越的重构性能。