Refactoring for Novices in Java: An Eye Tracking Study on the Extract vs. Inline Methods

Developers often extract methods to improve readability, understanding, and reuse, while inlining keeps logic in one block. Prior work based on static metrics has not shown clear differences between these practices, and the human side of comprehension and navigation remains underexplored. We investigate Inline Method vs. Extract Method refactorings using a dynamic approach: eye tracking while participants read and solve tasks. We analyze key code areas and compare visual effort and reading behavior (fixation duration and count, regressions, revisits), alongside time and attempts. We ran a controlled experiment with 32 Java novices, followed by short interviews. Each participant solved eight simple tasks across four programs presented in an inlined version and four in an extracted version. We also surveyed 58 additional novices for complementary quantitative and qualitative data. Results show that effects depend on task difficulty. In two tasks, method extraction improved performance and reduced visual effort, with time decreasing by up to 78.8% and regressions by 84.6%. For simpler tasks (e.g., square area), extraction hurt performance: time increased by up to 166.9% and regressions by 200%. Even with meaningful method names, novices often switched back and forth between call sites and extracted methods, increasing navigation and cognitive load. Preferences frequently favored extraction for readability and reuse, but did not always match measured performance. These findings suggest educators should be cautious about premature modularization for novices and highlight eye tracking as a useful complement to static metrics.

翻译：开发者常通过提取方法来提升代码的可读性、可理解性与复用性，而内联方法则将逻辑保留在单一代码块中。先前基于静态度量的研究未能明确揭示这两种实践之间的差异，且对于理解与导航过程中的人因层面仍缺乏深入探索。本研究采用动态方法——在参与者阅读并完成任务时进行眼动追踪，对“内联方法”与“提取方法”这两种重构方式展开调查。我们分析了关键代码区域，并对比了视觉努力与阅读行为（注视时长与次数、回视、重访），同时记录了任务完成时间与尝试次数。我们开展了一项受控实验，招募32名Java初学者参与，随后进行了简短访谈。每位参与者需完成八个简单任务，其中四个程序以内联版本呈现，另外四个以提取方法后的版本呈现。此外，我们还对另外58名初学者进行了问卷调查，以获取补充的定量与定性数据。结果表明，重构效果取决于任务难度。在两项任务中，方法提取提升了任务表现并降低了视觉努力，其中任务完成时间最多减少78.8%，回视次数减少84.6%。对于较简单的任务（如计算正方形面积），提取方法反而损害了表现：任务完成时间最多增加166.9%，回视次数增加200%。即使方法命名具有明确含义，初学者仍频繁在调用点与提取出的方法之间来回切换，从而增加了导航负担与认知负荷。参与者的主观偏好常倾向于提取方法，认为其有利于可读性与复用性，但这些偏好并不总是与实测表现相符。这些发现提示教育者应谨慎对待针对初学者的过早模块化教学，并表明眼动追踪可作为静态度量方法的有益补充。