Behind the Intent of Extract Method Refactoring: A Systematic Literature Review

Code refactoring is widely recognized as an essential software engineering practice to improve the understandability and maintainability of the source code. The Extract Method refactoring is considered as "Swiss army knife" of refactorings, as developers often apply it to improve their code quality. In recent years, several studies attempted to recommend Extract Method refactorings allowing the collection, analysis, and revelation of actionable data-driven insights about refactoring practices within software projects. In this paper, we aim at reviewing the current body of knowledge on existing Extract Method refactoring research and explore their limitations and potential improvement opportunities for future research efforts. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research opportunities in this context. We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, and degree of automation. The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer's involvement in the decision-making process when applying the method extraction, and (iii) the existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.

翻译：代码重构被广泛认为是提升源代码可理解性与可维护性的关键软件工程实践。提取方法重构被视为重构中的“瑞士军刀”，开发者常通过它改善代码质量。近年来，多项研究尝试推荐提取方法重构，通过收集、分析并揭示软件项目中重构实践的可操作数据驱动洞见。本文旨在回顾当前提取方法重构研究的知识体系，探讨其局限性及未来研究的潜在改进方向，从而帮助研究人员和实践者了解最新进展并识别该领域的新研究机遇。我们以系统文献综述（SLR）的形式梳理了提取方法重构的相关知识。在初步收集1,367篇论文后，经过系统性筛选，最终纳入83项主要研究。我们定义了三组研究问题，并根据方法学、适用性和自动化程度等多重标准系统性地构建并细化分类框架。研究结果构建了一个包含83种提取方法方案的目录，表明文献中已提出多种技术。主要发现包括：(i) 38.6%的提取方法重构研究主要关注代码克隆问题；(ii) 多数提取方法工具在实施方法提取时融入开发者的决策参与；(iii) 现有基准数据集异质性显著，所含信息类型不一致，导致难以对其进行标准化基准测试。