Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge. While extensive research has addressed this in English, little is known about multilingual LLMs. This paper systematically evaluates multilingual LLMs' factual accuracy across languages and geographic regions. We introduce a novel pipeline for multilingual factuality evaluation, adapting FActScore(Min et al., 2023) for diverse languages. Our analysis across nine languages reveals that English consistently outperforms others in factual accuracy and quantity of generated facts. Furthermore, multilingual models demonstrate a bias towards factual information from Western continents. These findings highlight the need for improved multilingual factuality assessment and underscore geographical biases in LLMs' fact generation.
翻译:大语言模型(LLMs)易于产生事实性幻觉,生成与既定知识相矛盾的文本。尽管已有大量研究针对英语场景进行了探讨,但关于多语言大语言模型的研究仍十分有限。本文系统性地评估了多语言大语言模型在不同语言和地理区域间的事实准确性。我们提出了一种面向多语言事实评估的新型流水线,将FActScore(Min等,2023)适配至多种语言。针对九种语言的分析表明,英语在事实准确性及生成事实数量上始终优于其他语言。此外,多语言模型表现出对西方大陆事实信息的偏向性。这些发现揭示了改进多语言事实评估的必要性,并凸显了大语言模型事实生成中的地理偏差问题。