GitHub Copilot and Developer Productivity: An Observational Dose-Response Analysis

Does GitHub Copilot (GHCP) make engineers more productive, or do the engineers who use it more differ from those who use it less? And even within a single engineer, are GHCP-heavy weeks just busy weeks in which more of everything gets done? We study these questions using 43 weeks of data from 16,223 software engineers across Microsoft's Cloud+AI organization. Engineer fixed effects address the first concern by comparing each engineer against themselves rather than against other engineers, eliminating time-invariant differences in skill, role, and team. Active coding time and browser time then enter a Poisson Pseudo-Maximum Likelihood model with two-way fixed effects to address the harder, within-engineer confound: that GHCP-heavy weeks coincide with high-effort weeks. This defines our estimand as an efficiency effect: more pull requests completed at equivalent levels of coding time. Engineers are estimated to complete 40.5% more PRs in their highest GHCP usage weeks relative to their zero-usage weeks, holding measured development effort constant. The gradient is monotonic with diminishing returns at high intensity. Seven robustness and falsification tests target the remaining plausible alternative explanations (non-coding AI engagement, team-level shocks, within-week task reallocation, cross-week contamination, PR slicing into smaller units, shifts toward easier task types, and sensitivity to how the treatment is operationalized). Under an explicitly stated conditional-independence assumption, the within-engineer design estimates a tool-specific efficiency effect that is consistent with all seven robustness tests.

翻译：GitHub Copilot（GHCP）是否提高了工程师的生产力？或者说，使用该工具频率更高的工程师与使用频率较低的工程师本身存在差异？即使对于同一名工程师而言，GHCP使用量较高的周是否只是各项工作量都更多的繁忙周？我们利用微软云+AI组织中16,223名软件工程师的43周数据研究了这些问题。通过引入工程师固定效应，将每位工程师与自身进行比较而非与其他工程师比较，从而消除了技能、角色和团队中不随时间变化的差异，解决了第一个问题。随后，将主动编码时间和浏览器时间纳入具有双向固定效应的泊松伪极大似然模型，以解决更棘手的工程师内部混杂问题——即GHCP使用量高的周与高努力周相重合。由此将估计量定义为效率效应：在等效编码时间下完成更多拉取请求。在控制测量到的开发工作量不变的情况下，工程师在GHCP使用量最高的周相比零使用周，预计完成的PR数量增加40.5%。该梯度呈单调性，但高强度使用时存在边际收益递减。我们通过七项稳健性检验和证伪检验针对其余可能的替代解释（非编码AI参与、团队层面的冲击、周内任务重新分配、跨周污染、将PR拆分为更小的单元、转向更简单任务类型，以及处理变量操作化方式的敏感性）进行检验。在明确陈述的条件独立性假设下，该工程师内部设计估计出的工具特定效率效应与全部七项稳健性检验结果一致。

相关内容

GitHub

关注 88

http://GitHub.com 使用 Git 作为版本控制系统（version control system）提供在线源码托管的服务，同时是个有社交功能的开发者社区。国外类似服务： http://Bitbucket.com
http://Gitlab.com
国内类似服务：
http://Coding.net

【综述】智能体AI如何重塑软件开发生命周期：从代码补全到人类监督下的委托执行

专知会员服务

14+阅读 · 5月2日

《ClaudeCode源码深度研究报告（增强完整版）》，下载链接

专知会员服务

40+阅读 · 4月1日