Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
翻译:大型语言模型(LLM)在学术界和公众中常被讨论为几乎任何依赖文本生成的场景(包括软件工程)的辅助工具。目前,对于工业界工程师使用基于LLM的工具(如ChatGPT)的实际效用,存在大量争论但缺乏实证证据。我们开展了一项为期一周的观察性研究,对24名专业软件工程师在工作场景中使用ChatGPT的情况进行跟踪,并定性分析了他们与聊天机器人的对话记录以及整体体验(通过退出调查收集)。研究发现,相较于期望ChatGPT生成可直接使用的软件制品(例如代码),实践者更常利用ChatGPT获取解决任务的方法指导或学习抽象层面的主题知识。我们还提出了一个理论框架,阐释(i)交互目的、(ii)内部因素(如用户个性)以及(iii)外部因素(如公司政策)如何共同塑造用户体验(在感知有用性和信任度方面)。我们预期该框架可被未来研究用于推进软件工程实践者使用LLM的学术讨论,并可作为该领域未来LLM实验研究设计的参考基准。