This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT, a prominent large language model (LLM). The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets, and is linked to corresponding software development artifacts such as source code, commits, issues, pull requests, discussions, and Hacker News threads. This comprehensive dataset is derived from shared ChatGPT conversations collected from GitHub and Hacker News, providing a rich resource for understanding the dynamics of developer interactions with ChatGPT, the nature of their inquiries, and the impact of these interactions on their work. DevGPT enables the study of developer queries, the effectiveness of ChatGPT in code generation and problem solving, and the broader implications of AI-assisted programming. By providing this dataset, the paper paves the way for novel research avenues in software engineering, particularly in understanding and improving the use of LLMs like ChatGPT by developers.
翻译:本文介绍了DevGPT,一个为探索软件开发者如何与大型语言模型(LLM)ChatGPT交互而精心整理的数据库。该数据集包含来自ChatGPT的29,778条提示与回复,涵盖19,106个代码片段,并与源代码、提交记录、问题、拉取请求、讨论及Hacker News讨论帖等相应软件开发工件相关联。这一综合数据集源自从GitHub和Hacker News收集的公开ChatGPT对话,为理解开发者与ChatGPT的交互动态、提问性质及其对工作的影响提供了丰富资源。DevGPT使研究者能够深入分析开发者的查询模式、ChatGPT在代码生成与问题解决中的有效性,以及人工智能辅助编程的广泛影响。通过提供该数据集,本文为软件工程领域开辟了新的研究路径,尤其在理解与改进开发者对ChatGPT等大型语言模型的应用方面具有重要价值。