Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 x 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
翻译:近期,基于大语言模型(LLM)的生成式人工智能在多个领域展现出令人瞩目的高质量性能,尤其在ChatGPT发布后势头更盛。许多人认为,这类技术具备在软件开发中执行通用问题求解的潜力,并可能取代人类软件开发者。然而,目前尚缺乏关于LLM技术完成软件开发任务能力的严谨研究。通过一项包含109名参与者的2×2受试者间对照实验,我们考察了ChatGPT在编码任务与典型软件开发任务中的辅助效力及作用程度,并探究了人类与ChatGPT协作的方式。研究发现:虽然ChatGPT在解决简单编码问题时表现优异,但其在支撑典型软件开发任务方面的表现并不理想。我们还观察了参与者与ChatGPT的交互过程,发现了交互模式与任务结果之间的关联。本研究首次提供了真实开发者使用ChatGPT完成软件工程任务的一手洞察,并揭示了新型交互机制的设计需求——这类机制能够帮助开发者有效协同大语言模型以实现预期目标。