[Context] AI assistants, like GitHub Copilot and Cursor, are transforming software engineering. While several studies highlight productivity improvements, their impact on maintainability requires further investigation. [Objective] This study investigates whether co-development with AI assistants affects software maintainability, specifically how easily other developers can evolve the resulting source code. [Method] We conducted a two-phase controlled experiment involving 151 participants, 95% of whom were professional developers. In Phase 1, participants added a new feature to a Java web application, with or without AI assistance. In Phase 2, a randomized controlled trial, new participants evolved these solutions without AI assistance. [Results] Phase 2 revealed no significant differences in subsequent evolution with respect to completion time or code quality. Bayesian analysis suggests that any speed or quality improvements from AI use were at most small and highly uncertain. Observational results from Phase 1 corroborate prior research: using an AI assistant yielded a 30.7% median reduction in completion time, and habitual AI users showed an estimated 55.9% speedup. [Conclusions] Overall, we did not detect systematic maintainability advantages or disadvantages when other developers evolved code co-developed with AI assistants. Within the scope of our tasks and measures, we observed no consistent warning signs of degraded code-level maintainability. Future work should examine risks such as code bloat from excessive code generation and cognitive debt as developers offload more mental effort to assistants.
翻译:[背景] GitHub Copilot和Cursor等AI助手正在变革软件工程领域。尽管多项研究强调了其带来的生产力提升,但这些工具对软件可维护性的影响仍需深入探究。[目标] 本研究旨在探究与AI助手协同开发是否会影响软件的可维护性,特别是其他开发者对生成源代码进行演化的难易程度。[方法] 我们开展了一项包含两个阶段的受控实验,共涉及151名参与者,其中95%为专业开发者。在第一阶段,参与者在使用或不使用AI辅助的情况下,为Java Web应用程序添加新功能。在第二阶段,通过随机对照试验,新参与者在无AI辅助的情况下对这些解决方案进行演化开发。[结果] 第二阶段实验显示,在后续演化过程中,完成时间和代码质量方面均未出现显著差异。贝叶斯分析表明,使用AI可能带来的速度或质量提升至多微乎其微且具有高度不确定性。第一阶段的观察结果印证了先前研究:使用AI助手使完成时间中位数减少30.7%,而习惯性AI用户估计可获得55.9%的速度提升。[结论] 总体而言,当其他开发者演化与AI助手协同开发的代码时,我们未检测到系统性的可维护性优势或劣势。在我们研究任务和度量指标的范围内,未观察到代码级可维护性下降的一致预警信号。未来研究应关注过度代码生成导致的代码膨胀,以及开发者将更多认知负荷转移给助手所产生的认知负债等风险。