An Empirical Study on the Potential of LLMs in Automated Software Refactoring

Recent advances in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6% to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring. To this end, we propose a detect-and-reapply tactic, called RefactoringMirror, to avoid such unsafe refactorings. By reapplying the identified refactorings to the original code using thoroughly tested refactoring engines, we can effectively mitigate the risks associated with LLM-based automated refactoring while still leveraging LLM's intelligence to obtain valuable refactoring recommendations.

翻译：近年来，大型语言模型（LLMs）的进展使得利用LLMs自动重构源代码成为可能。然而，与人类专家相比，LLMs在自动、准确地执行重构方面的表现仍不明确。为填补这一空白，本文开展了一项实证研究，以探究LLMs在自动化软件重构中的潜力，重点关注重构机会的识别与重构方案的推荐。我们首先构建了一个包含来自20个项目共180个真实重构的高质量重构数据集，并基于该数据集进行实证研究。以待重构的Java文档作为输入，ChatGPT和Gemini分别仅识别出180个重构机会中的28个和7个。然而，在提示中说明预期的重构子类别并缩小搜索范围，将ChatGPT的成功率从15.6%显著提升至86.7%。在重构方案推荐方面，ChatGPT为180个重构任务推荐了176个重构方案，其中63.6%的推荐方案与人类专家构建的方案相当（甚至更优）。然而，ChatGPT推荐的176个方案中有13个，以及Gemini推荐的137个方案中有9个是不安全的，因为它们要么改变了源代码的功能，要么引入了语法错误，这揭示了基于LLM的重构存在的风险。为此，我们提出了一种称为RefactoringMirror的检测与重应用策略，以避免此类不安全的重构。通过使用经过充分测试的重构引擎将识别出的重构重新应用到原始代码上，我们能够有效降低基于LLM的自动化重构所带来的风险，同时仍可利用LLM的智能获取有价值的重构建议。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日