Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whether the quality of LLM responses varies depending on the demographic profile of users. Considering English as the global lingua franca, along with the diversity of its dialects among speakers of different native languages, we explore whether non-native English speakers receive lower-quality or even factually incorrect responses from LLMs more frequently. Our results show that performance discrepancies occur when LLMs are prompted by native versus non-native English speakers and persist when comparing native speakers from Western countries with others. Additionally, we find a strong anchoring effect when the model recognizes or is made aware of the user's nativeness, which further degrades the response quality when interacting with non-native speakers. Our analysis is based on a newly collected dataset with over 12,000 unique annotations from 124 annotators, including information on their native language and English proficiency.
翻译:大型语言模型(LLM)擅长提供在大型语料库预训练期间获取的信息,并通过用户提示遵循指令。本研究探讨LLM响应质量是否因用户的人口统计特征而异。考虑到英语作为全球通用语,以及不同母语使用者间英语方言的多样性,我们探究非英语母语者是否更频繁地从LLM获得质量较低甚至事实错误的响应。我们的结果表明,当LLM由英语母语者与非母语者提示时会出现性能差异,且在比较来自西方国家与其他地区的母语者时该差异依然存在。此外,我们发现当模型识别或意识到用户的母语背景时存在强烈的锚定效应,这在与非母语者交互时会进一步降低响应质量。我们的分析基于新收集的数据集,包含来自124位标注者的超过12,000条独立标注,其中涵盖其母语及英语水平信息。