Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that Large Language Models (LLMs) have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.
翻译:长方法将多个职责封装在单一方法中,难以维护。如何选择哪些语句应提取至新方法,一直是许多研究工具的目标。尽管这些工具持续改进,但它们常常无法生成符合开发者偏好和验收标准的重构方案。鉴于大语言模型(LLM)已在大量代码语料库上训练,若能利用其对开发者构建函数习惯的熟悉程度,我们就能提出开发者更可能接受的重构建议。本文通过协同融合LLM的洞察力与IDE的强大能力来执行提取方法(EM),推动了重构科学与实践的进步。我们对1752个EM场景的形成性研究发现,LLM在提供专家建议方面非常有效,但存在不可靠性:高达76.3%的建议是幻觉。我们设计了一种新颖方法,首先剔除LLM建议候选中的幻觉,然后基于程序切片静态分析技术进一步增强和排序建议,最终借助IDE正确执行重构。我们在IntelliJ IDEA插件中实现了该方法,称为EM-Assist。通过模拟开源项目中1752个实际重构的多样化语料库,我们对EM-Assist进行了实证评估。结果显示,EM-Assist优于现有最先进工具:在53.4%的案例中,EM-Assist建议了开发者实际执行的重构,相比此前最佳工具的39.4%召回率有所提升。此外,我们针对16位工业开发者近期提交的代码开展了消防演习调查,其中81.3%的开发者认可了EM-Assist提供的建议。