大型语言模型对开源创新的影响：来自GitHub Copilot的证据 (The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot)

Large Language Models (LLMs) have been shown to enhance individual productivity in guided settings. Whereas LLMs are likely to also transform innovation processes in a collaborative work setting, it is unclear what trajectory this transformation will follow. Innovation in these contexts encompasses both capability innovation that explores new possibilities by acquiring new competencies in a project and iterative innovation that exploits existing foundations by enhancing established competencies and improving project quality. Whether LLMs affect these two aspects of collaborative work and to what extent is an open empirical question. Open-source development provides an ideal setting to examine LLM impacts on these innovation types, as its voluntary and open/collaborative nature of contributions provides the greatest opportunity for technological augmentation. We focus on open-source projects on GitHub by leveraging a natural experiment around the selective rollout of GitHub Copilot (a programming-focused LLM) in October 2021, where GitHub Copilot selectively supported programming languages like Python or Rust, but not R or Haskell. We observe a significant jump in overall contributions, suggesting that LLMs effectively augment collaborative innovation in an unguided setting. Interestingly, Copilot's launch increased iterative innovation focused on maintenance-related or feature-refining contributions significantly more than it did capability innovation through code-development or feature-introducing commits. This disparity was more pronounced after the model upgrade in June 2022 and was evident in active projects with extensive coding activity, suggesting that as both LLM capabilities and/or available contextual information improve, the gap between capability and iterative innovation may widen. We discuss practical and policy implications to incentivize high-value innovative solutions.

翻译：已有研究表明，大型语言模型（LLMs）在受引导的环境中能够提升个人生产力。尽管LLMs也可能在协作工作环境中改变创新过程，但这一转变将遵循何种轨迹尚不明确。此类情境中的创新既包括通过获取项目新能力以探索新可能性的能力创新，也包含通过增强既有能力、提升项目质量以开发现有基础的迭代创新。LLMs是否影响协作工作的这两个方面，以及影响程度如何，仍是一个有待实证检验的开放性问题。开源开发为考察LLMs对这两类创新的影响提供了理想场景，因其自愿、开放/协作的贡献特性为技术增强提供了最大可能性。我们聚焦GitHub上的开源项目，利用2021年10月GitHub Copilot（一款专注于编程的LLM）选择性推出这一自然实验展开研究——该工具当时选择性支持Python或Rust等编程语言，而不支持R或Haskell。我们观察到整体贡献量出现显著跃升，表明LLMs在无引导环境下能有效增强协作创新。有趣的是，Copilot的发布使聚焦于维护相关或功能优化的迭代创新（贡献）显著增加，其增幅远超通过代码开发或功能引入提交实现的能力创新。这一差异在2022年6月模型升级后更为明显，并在编码活动活跃的项目中尤为突出，这表明随着LLM能力和/或可用上下文信息的提升，能力创新与迭代创新之间的差距可能进一步扩大。本文最后讨论了激励高价值创新解决方案的实践与政策启示。

相关内容

GitHub

关注 88

http://GitHub.com 使用 Git 作为版本控制系统（version control system）提供在线源码托管的服务，同时是个有社交功能的开发者社区。国外类似服务： http://Bitbucket.com
http://Gitlab.com
国内类似服务：
http://Coding.net

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日