Large language models (LLMs) have brought breakthroughs in tasks including translation, summarization, information retrieval, and language generation, gaining growing interest in the CHI community. Meanwhile, the literature shows researchers' controversial perceptions about the efficacy, ethics, and intellectual abilities of LLMs. However, we do not know how people perceive LLMs that are pervasive in everyday tools, specifically regarding their experience with LLMs around bias, stereotypes, social norms, or safety. In this study, we conducted a systematic review to understand what empirical insights papers have gathered about people's perceptions toward LLMs. From a total of 231 retrieved papers, we full-text reviewed 15 papers that recruited human evaluators to assess their experiences with LLMs. We report different biases and related concepts investigated by these studies, four broader LLM application areas, the evaluators' perceptions toward LLMs' performances including advantages, biases, and conflicting perceptions, factors influencing these perceptions, and concerns about LLM applications.
翻译:大型语言模型(LLM)在翻译、摘要、信息检索和语言生成等任务中取得了突破性进展,在人机交互(CHI)社区中获得了日益增长的关注。与此同时,文献表明研究人员对LLM的功效、伦理和智力能力存在争议性看法。然而,我们尚不了解人们如何看待已渗透到日常工具中的LLM,特别是他们围绕偏见、刻板印象、社会规范或安全性与LLM互动的体验。在本研究中,我们进行了一项系统性综述,以了解论文中关于人们对LLM认知的实证见解。从共计231篇检索到的论文中,我们对15篇招募人类评估者评估其LLM体验的论文进行了全文审阅。我们报告了这些研究所探讨的不同偏见及相关概念、四大LLM应用领域、评估者对LLM表现的认知(包括优势、偏见和矛盾看法)、影响这些认知的因素,以及对LLM应用的担忧。