Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods, such as prompt engineering, to improve model responses. However, they focus on enhancing the model's performance in specific tasks, and little has been investigated on how to deal with the user dissatisfaction resulting from the model's responses. Therefore, with ChatGPT as the case study, we examine users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfactory experiences, which we released as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction related to accuracy the highest. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.
翻译:具备聊天功能的大型语言模型(如ChatGPT)已在各类工作流程中广泛应用。然而,由于用户对这些大规模模型的理解有限,他们在使用该技术时面临困难并遭遇不同类型的不满。研究者已提出多种方法(如提示工程)以改进模型回复,但这些方法主要聚焦于提升模型在特定任务中的性能,针对如何处理因模型回复引发的用户不满则鲜有研究。因此,本研究以ChatGPT为案例,系统探究用户的不满及其应对策略。基于文献综述,我们将用户对大型语言模型的不满归纳为七类,并收集了107名用户提供的511个ChatGPT不满意回复实例及其详细回忆记录,已作为公开数据集发布。分析表明,用户最常因ChatGPT未能理解其意图而产生不满,而对准确性相关不满的严重程度评分最高。我们同时识别出用户应对不满的四种策略及其有效性。研究发现,用户常未采取任何策略应对不满,即使使用策略,仍有72%的不满未能解决。此外,知识水平较低的用户更易遭遇准确性相关的不满,却往往投入最小努力应对不满。基于这些发现,我们提出了旨在减少用户不满并增强聊天式大型语言模型可用性的设计建议。