Self-Admitted GenAI Usage in Open-Source Software

The widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into open-source software projects, we analyze a curated sample of more than 250,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in the new era of AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.

翻译：GitHub Copilot和ChatGPT等生成式AI工具的广泛采用正在改变软件开发。由于生成的源代码几乎无法与人工编写代码区分，其在现实世界中的使用情况及其对开源软件开发的影响仍鲜为人知。本文提出"自承认生成式AI使用"的概念，即开发者在软件制品中明确提及使用生成式AI工具进行内容创作。以此为视角研究生成式AI工具如何融入开源软件项目，我们分析了超过25万个GitHub仓库的精选样本，在156个仓库的提交信息、代码注释和项目文档中识别出1,292例此类自承认记录。采用混合研究方法，基于1,292条定性编码的提及记录，我们构建了包含32种任务类型、10种内容类型和11种使用目的的分类体系。随后分析了13份涉及生成式AI工具政策与使用指南的文档，并通过开发者调查揭示了背后的伦理、法律和实践关切。研究发现表明，开发者正在积极管理生成式AI在项目中的使用方式，这凸显了在AI辅助软件开发的新时代，项目层面需要建立透明度、归属声明和质量控制机制。最后，我们追踪了151个存在自承认生成式AI使用记录的仓库中代码变更率的纵向影响，发现并未出现普遍增长，这与当前关于生成式AI对软件开发影响的流行论述相悖。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/