A meta-analysis of the effect of generative AI on productivity and learning in programming

Generative artificial intelligence (GenAI) is increasingly used for programming, yet it remains unclear when and where GenAI tools lead to productivity gains. Evidence on the effects of GenAI on the long-term development of programming skills is similarly mixed. Here, we present a meta-analysis of $n = 23$ studies reporting $k = 27$ effect sizes to quantify the effect of GenAI-powered coding assistants on productivity and learning. We systematically searched (i) ACM, (ii) arXiv, (iii) Scopus, and (iv) Web of Science for studies published between 2019 and 2025. Studies were required to compare GenAI-assisted with unassisted programming using quantitative measures of (1) productivity (i.e., task completion time, commits, and lines of code) and (2) learning (i.e., exam performance). We assessed the risk of bias using RoB2 and ROBINS-I and compared standardized effect sizes using Hedges' $g$. We find a statistically significant, but moderate positive effect of GenAI assistance on developer productivity ($g = 0.33$, $95\%$ CI: $[0.09, 0.58]$), yet with substantial heterogeneity across settings. Notably, productivity gains tend to be larger in controlled experimental settings, while effects are smaller in open-source and enterprise contexts. In contrast, we find no statistically significant effect of GenAI assistance on learning outcomes ($g = 0.14$, $95\%$ CI: $[-0.18, 0.47]$). Overall, these results highlight that GenAI coding assistants can increase developer productivity, although these gains depend strongly on context. In educational settings, however, the use of GenAI does not consistently translate into improved learning or skill development, which highlights the need for careful integration of GenAI into computer science education.

翻译：生成式人工智能（GenAI）在编程领域的应用日益广泛，然而关于GenAI工具在何种情境下能够提升生产力仍不明确。现有证据对GenAI在编程技能长期发展中的作用同样存在矛盾。本文对23项研究报告的27个效应量进行元分析，以量化GenAI辅助编程工具对生产力与学习的影响。我们系统检索了（i）ACM、（ii）arXiv、（iii）Scopus与（iv）Web of Science中2019至2025年间发表的研究。所选研究需通过量化指标比较GenAI辅助编程与无辅助编程的差异：（1）生产力（任务完成时间、提交次数与代码行数）和（2）学习效果（考试成绩）。我们采用RoB2与ROBINS-I工具评估偏倚风险，并利用Hedges' g比较标准化效应量。研究发现，GenAI辅助对开发者生产力具有统计学显著但适度的正向影响（g = 0.33，95%置信区间：[0.09, 0.58]），但不同情境间存在显著异质性。值得注意的是，生产力提升在受控实验环境下更为显著，而在开源与企业环境中效应较小。相反，GenAI辅助对学习效果未发现统计学显著影响（g = 0.14，95%置信区间：[-0.18, 0.47]）。整体而言，这些结果强调GenAI编码辅助工具能够提升开发者生产力，但其增益高度依赖具体情境。然而在教育环境中，GenAI的使用并未稳定转化为学习效果或技能提升的改善，这凸显了将GenAI审慎融入计算机科学教育的必要性。