Software development agents such as Claude Code, GitHub Copilot, Cursor Agent, Devin, and OpenAI Codex are being increasingly integrated into developer workflows. While prior work has evaluated agent capabilities for code completion and task automation, there is little work investigating how these agents perform Java refactoring in practice, the types of changes they make, and their impact on code quality. In this study, we present the first analysis of agentic refactoring pull requests in Java, comparing them to developer refactorings across 86 projects per group. Using RefactoringMiner and DesigniteJava 3.0, we identify refactoring types and detect code smells before and after refactoring commits. Our results show that agent refactorings are dominated by annotation changes (the 5 most common refactoring types done by agents are annotation related), in contrast to the diverse structural improvements typical of developers. Despite these differences in refactoring types, we find Cursor to be the only model to show a statistically significant increase in refactoring smells.
翻译:诸如Claude Code、GitHub Copilot、Cursor Agent、Devin和OpenAI Codex等软件开发智能体正日益融入开发者的工作流程。尽管先前的研究已评估了智能体在代码补全和任务自动化方面的能力,但很少有工作探究这些智能体在实践中如何进行Java代码重构、它们所进行的变更类型及其对代码质量的影响。在本研究中,我们首次对Java项目中由智能体发起的重构拉取请求进行了分析,并将其与开发者的重构行为进行了对比,每组涵盖86个项目。通过使用RefactoringMiner和DesigniteJava 3.0工具,我们识别了重构类型,并检测了重构提交前后的代码异味。我们的结果表明,智能体主导的重构以注解变更为主(智能体执行的最常见的5种重构类型均与注解相关),这与开发者通常进行的多样化结构性改进形成鲜明对比。尽管在重构类型上存在这些差异,我们发现Cursor是唯一一个在统计上显著增加重构异味的模型。