Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
翻译:大型语言模型(LLMs)在学术界和公众中被频繁讨论,被视为几乎所有依赖文本生成的应用场景(包括软件工程)的支撑工具。目前,关于基于LLM的工具(如ChatGPT)对工业界工程师的实际效用,存在大量争论但鲜有实证证据。我们开展了一项观察研究,对24名专业软件工程师进行了为期一周的追踪,分析他们与ChatGPT的对话记录及其整体使用体验(通过退出调查获取),并进行定性分析。研究发现,相比于期望ChatGPT生成可直接使用的软件制品(如代码),从业者更常利用ChatGPT获取解决任务的方法指导或以更抽象的方式学习主题知识。我们进一步提出一个理论框架,阐释(i)交互目的、(ii)内部因素(如用户性格)和(iii)外部因素(如公司政策)如何共同塑造用户体验(感知有用性与信任度)。我们预期该框架可被未来研究用于推动软件工程从业者使用LLM的学术讨论,并为该领域未来实证LLM研究的设计提供参考依据。