Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods, such as prompt engineering, to improve model responses. However, they focus on enhancing the model's performance in specific tasks, and little has been investigated on how to deal with the user dissatisfaction resulting from the model's responses. Therefore, with ChatGPT as the case study, we examine users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfactory experiences, which we released as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction related to accuracy the highest. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.
翻译:基于聊天能力的大型语言模型(LLMs),如ChatGPT,被广泛应用于各种工作流程中。然而,由于对这些大规模模型的理解有限,用户难以有效运用该技术,并经历了不同类型的失望。研究人员已提出多种方法(如提示工程)以改善模型回复,但这些方法侧重于提升模型在特定任务上的性能,而对如何处理由模型回复引发的用户失望却鲜有研究。因此,我们以ChatGPT为案例,考察用户的失望情绪及其应对策略。基于文献综述,我们将用户对LLM的失望分为七类,随后收集了107名用户的511个不满意ChatGPT回复实例及其对不满体验的详细回忆,并发布为公开数据集。分析显示,当ChatGPT未能理解用户意图时,用户最常感到不满,而涉及准确性的不满严重程度评分最高。我们还识别出用户应对不满的四种策略及其有效性。研究发现,用户往往不采取任何策略应对不满,即使使用了策略,仍有72%的不满未得到解决。此外,低LLM知识水平的用户更易面临准确性方面的不满,且通常投入最少努力解决不满。基于这些发现,我们提出了减少用户不满并提升基于聊天的LLM可用性的设计启示。