The advent of Large Language Models (LLMs) has introduced a new paradigm in Software Engineering (SE), with generative AI tools like ChatGPT gaining widespread adoption among developers. While ChatGPT's potential has been extensively discussed, empirical evidence about how developers actually use LLMs' assistance in real-world practices remains limited. To bridge this gap, we conducted a large-scale empirical analysis of ChatGPT usage on GitHub, and we presented DevChat, a curated dataset of 2,547 publicly shared ChatGPT conversation links collected from GitHub between May 2023 and June 2024. Through comprehensively analyzing DevChat, we explored the characteristics of developer-ChatGPT interaction patterns and identified five key categories of developers' purposes for sharing developer-ChatGPT conversations during software development. Additionally, we investigated the dominant development-related activities in which ChatGPT is used, and presented a mapping framework that links GitHub data sources, development-related activities, and SE tasks. The findings show that interactions are typically short and task-focused (most are 1-3 turns); developers share conversations mainly to delegate tasks, resolve problems, and acquire knowledge, revealing five purpose categories; ChatGPT is most frequently engaged for Software Implementation and Maintenance & Evolution; we identified 39 fine-grained SE tasks supported by ChatGPT, with Code Generation & Completion as well as Code modification & Optimization being the most prominent. Our study offers a comprehensive mapping of ChatGPT's applications in real-world software development scenarios and provides a foundation for understanding LLMs' practical roles in software development.
翻译:大型语言模型(LLM)的出现为软件工程(SE)引入了新的范式,以ChatGPT为代表的生成式AI工具已在开发者中广泛采用。尽管ChatGPT的潜力已被广泛讨论,但关于开发者如何在真实实践中实际利用LLM辅助的经验证据仍然有限。为填补这一空白,我们对GitHub上的ChatGPT使用情况进行了大规模实证分析,并提出了DevChat——一个精选的数据集,其中包含2023年5月至2024年6月期间从GitHub收集的2,547个公开分享的ChatGPT对话链接。通过对DevChat的全面分析,我们探究了开发者与ChatGPT交互模式的特征,并识别出开发者在软件开发过程中分享此类对话的五类主要目的。此外,我们研究了ChatGPT最常参与的主要开发相关活动,并提出了一个将GitHub数据源、开发相关活动与SE任务关联起来的映射框架。研究发现:交互通常简短且以任务为中心(多数为1-3轮对话);开发者分享对话主要为委派任务、解决问题和获取知识,可归纳为五类目的;ChatGPT最频繁地应用于软件实现以及维护与演化活动;我们识别出ChatGPT支持的39项细粒度SE任务,其中代码生成与补全以及代码修改与优化最为突出。本研究全面描绘了ChatGPT在真实软件开发场景中的应用图谱,为理解LLM在软件开发中的实际作用奠定了基础。