Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level

Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods such as prompt engineering to improve model responses. However, they focus on crafting one prompt, and little has been investigated on how to deal with the dissatisfaction the user encountered during the conversation. Therefore, with ChatGPT as the case study, we examine end users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfied experiences, which we release as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction the highest with dissatisfaction related to accuracy. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge regarding LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM services.

翻译：具备对话能力的的大语言模型，如ChatGPT，已广泛应用于各类工作流程中。然而，由于对这些大规模模型的理解有限，用户在使用该技术时常常感到不同形式的不满。研究者已提出提示工程等多种方法来改善模型回复，但这些方法侧重于设计单次提示，对于如何处理用户在对话过程中遇到的不满情绪却鲜有探究。因此，以ChatGPT为案例，我们研究了终端用户的不满情绪及其应对策略。基于文献综述将用户对大语言模型的不满归纳为七类后，我们从107名用户中收集了511个引发不满的ChatGPT回复实例及其详细回忆，并发布为公开数据集。分析表明，当ChatGPT未能理解用户意图时，不满情绪最为频繁；而与准确性相关的不满，其严重程度评价最高。我们还识别出用户应对不满的四种策略及其有效性。研究发现，用户往往不采取任何策略来解决不满，即便采取策略，仍有72%的不满未能解决。此外，对大语言模型知识水平较低的用户更易遭遇准确性相关的不满，且通常投入最少努力来应对不满。基于这些发现，我们提出设计建议，旨在最小化用户不满并提升对话型大语言模型服务的可用性。