Hallucinations in generative AI, particularly in Large Language Models (LLMs), pose a significant challenge to the reliability of multilingual applications. Existing benchmarks for hallucination detection focus primarily on English and a few widely spoken languages, lacking the breadth to assess inconsistencies in model performance across diverse linguistic contexts. To address this gap, we introduce Poly-FEVER, a large-scale multilingual fact verification benchmark specifically designed for evaluating hallucination detection in LLMs. Poly-FEVER comprises 77,973 labeled factual claims spanning 11 languages, sourced from FEVER, Climate-FEVER, and SciFact. It provides the first large-scale dataset tailored for analyzing hallucination patterns across languages, enabling systematic evaluation of LLMs such as ChatGPT and the LLaMA series. Our analysis reveals how topic distribution and web resource availability influence hallucination frequency, uncovering language-specific biases that impact model accuracy. By offering a multilingual benchmark for fact verification, Poly-FEVER facilitates cross-linguistic comparisons of hallucination detection and contributes to the development of more reliable, language-inclusive AI systems. The dataset is publicly available to advance research in responsible AI, fact-checking methodologies, and multilingual NLP, promoting greater transparency and robustness in LLM performance. The proposed Poly-FEVER is available at: https://huggingface.co/datasets/HanzhiZhang/Poly-FEVER.
翻译:生成式人工智能,特别是大语言模型中的幻觉问题,对多语言应用的可靠性构成了重大挑战。现有的幻觉检测基准主要集中于英语和少数几种广泛使用的语言,缺乏评估模型在不同语言语境下性能不一致性的广度。为填补这一空白,我们提出了Poly-FEVER,一个专为评估大语言模型幻觉检测而设计的大规模多语言事实核查基准。Poly-FEVER包含来自FEVER、Climate-FEVER和SciFact的77,973条带标注事实主张,涵盖11种语言。它提供了首个为分析跨语言幻觉模式而定制的大规模数据集,使得对ChatGPT及LLaMA系列等大语言模型进行系统性评估成为可能。我们的分析揭示了主题分布和网络资源可用性如何影响幻觉频率,并发现了影响模型准确性的语言特定偏见。通过提供一个用于事实核查的多语言基准,Poly-FEVER促进了幻觉检测的跨语言比较,并有助于开发更可靠、更具语言包容性的人工智能系统。该数据集已公开提供,以推动负责任人工智能、事实核查方法以及多语言自然语言处理的研究,从而提升大语言模型性能的透明度和鲁棒性。所提出的Poly-FEVER可通过以下链接获取:https://huggingface.co/datasets/HanzhiZhang/Poly-FEVER。