An empirical study of ChatGPT-3.5 on question answering and code maintenance

Ever since the launch of ChatGPT in 2022, a rising concern is whether ChatGPT will replace programmers and kill jobs. Motivated by this widespread concern, we conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining. We reused a dataset introduced by prior work, which includes 130 StackOverflow (SO) discussion threads referred to by the Java developers of 357 GitHub projects. We mainly investigated three research questions (RQs). First, how does ChatGPT compare with programmers when answering technical questions? Second, how do developers perceive the differences between ChatGPT's answers and SO answers? Third, how does ChatGPT compare with humans when revising code for maintenance requests? For RQ1, we provided the 130 SO questions to ChatGPT, and manually compared ChatGPT answers with the accepted/most popular SO answers in terms of relevance, readability, informativeness, comprehensiveness, and reusability. For RQ2, we conducted a user study with 30 developers, asking each developer to assess and compare 10 pairs of answers, without knowing the information source (i.e., ChatGPT or SO). For RQ3, we distilled 48 software maintenance tasks from 48 GitHub projects citing the studied SO threads. We queried ChatGPT to revise a given Java file, and to incorporate the code implementation for any prescribed maintenance requirement. Our study reveals interesting phenomena: For the majority of SO questions (97/130), ChatGPT provided better answers; in 203 of 300 ratings, developers preferred ChatGPT answers to SO answers; ChatGPT revised code correctly for 22 of the 48 tasks. Our research will expand people's knowledge of ChatGPT capabilities, and shed light on future adoption of ChatGPT by the software industry.

翻译：自2022年ChatGPT发布以来，一个日益引发关注的问题是ChatGPT是否会取代程序员、导致岗位消失。受这一普遍担忧的驱动，我们开展了一项实证研究，系统比较ChatGPT与程序员在问答与软件维护方面的表现。我们重复使用了前人研究引入的数据集，该数据集包含357个GitHub项目中Java开发者参考的130个StackOverflow（SO）讨论线程。我们主要探究了三个研究问题（RQs）：第一，在回答技术问题时，ChatGPT与程序员相比表现如何？第二，开发者如何感知ChatGPT回答与SO回答之间的差异？第三，在根据维护需求修改代码时，ChatGPT与人类相比表现如何？针对RQ1，我们将130个SO问题提供给ChatGPT，并从相关性、可读性、信息量、全面性和可复用性五个维度，手动比较ChatGPT答案与SO被采纳/最热门答案。针对RQ2，我们开展了一项涵盖30名开发者的用户研究，要求每位开发者在不了解信息来源（即ChatGPT或SO）的情况下，评估并比较10对答案。针对RQ3，我们从引用所研究SO线程的48个GitHub项目中提炼出48项软件维护任务，要求ChatGPT修改给定的Java文件，并嵌入指定维护需求的代码实现。我们的研究揭示了有趣的现象：在大多数SO问题（97/130）中，ChatGPT提供了更优答案；在300次评分中，开发者有203次更青睐ChatGPT答案而非SO答案；ChatGPT在48项任务中有22项正确修改了代码。本研究将拓展人们对ChatGPT能力的认知，并为软件行业未来采用ChatGPT提供启示。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日