The widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into open-source software projects, we analyze a curated sample of more than 250,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in the new era of AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.
翻译:GitHub Copilot和ChatGPT等生成式AI工具的广泛采用正在改变软件开发。由于生成的源代码几乎无法与人工编写代码区分,其在现实世界中的使用情况及其对开源软件开发的影响仍鲜为人知。本文提出"自承认生成式AI使用"的概念,即开发者在软件制品中明确提及使用生成式AI工具进行内容创作。以此为视角研究生成式AI工具如何融入开源软件项目,我们分析了超过25万个GitHub仓库的精选样本,在156个仓库的提交信息、代码注释和项目文档中识别出1,292例此类自承认记录。采用混合研究方法,基于1,292条定性编码的提及记录,我们构建了包含32种任务类型、10种内容类型和11种使用目的的分类体系。随后分析了13份涉及生成式AI工具政策与使用指南的文档,并通过开发者调查揭示了背后的伦理、法律和实践关切。研究发现表明,开发者正在积极管理生成式AI在项目中的使用方式,这凸显了在AI辅助软件开发的新时代,项目层面需要建立透明度、归属声明和质量控制机制。最后,我们追踪了151个存在自承认生成式AI使用记录的仓库中代码变更率的纵向影响,发现并未出现普遍增长,这与当前关于生成式AI对软件开发影响的流行论述相悖。