Although large language models (LLMs) have shown exceptional performance in various natural language processing tasks, they are prone to hallucinations. State-of-the-art chatbots, such as the new Bing, attempt to mitigate this issue by gathering information directly from the internet to ground their answers. In this setting, the capacity to distinguish trustworthy sources is critical for providing appropriate accuracy contexts to users. Here we assess whether ChatGPT, a prominent LLM, can evaluate the credibility of news outlets. With appropriate instructions, ChatGPT can provide ratings for a diverse set of news outlets, including those in non-English languages and satirical sources, along with contextual explanations. Our results show that these ratings correlate with those from human experts (Spearmam's $\rho=0.54, p<0.001$). These findings suggest that LLMs could be an affordable reference for credibility ratings in fact-checking applications. Future LLMs should enhance their alignment with human expert judgments of source credibility to improve information accuracy.
翻译:尽管大型语言模型(LLMs)在各类自然语言处理任务中展现出卓越性能,但其仍存在产生幻觉的倾向。新一代搜索引擎必应等前沿聊天机器人通过直接从互联网获取信息来支撑其回答,试图缓解这一问题。在此背景下,区分可信信源的能力对于向用户提供恰当的准确性语境至关重要。本研究评估了典型大型语言模型ChatGPT能否评判新闻媒体的可信度。通过适当指令,ChatGPT可为包括非英语媒体与讽刺类来源在内的多样化新闻媒体提供评级及上下文解释。结果表明,这些评级与人类专家评级具有相关性(斯皮尔曼相关系数ρ=0.54,p<0.001)。该发现表明,大型语言模型可在事实核查应用中作为经济可行的可信度评估参考。未来大型语言模型需增强其与人类专家对信源可信度判断的一致性,以提升信息准确性。