The veracity of a factoid is largely independent of the language it is written in. However, language models are inconsistent in their ability to answer the same factual question across languages. This raises questions about how LLMs represent a given fact across languages. We explore multilingual factual knowledge through two aspects: the model's ability to answer a query consistently across languages, and the ability to ''store'' answers in a shared representation for several languages. We propose a methodology to measure the extent of representation sharing across languages by repurposing knowledge editing methods. We examine LLMs with various multilingual configurations using a new multilingual dataset. We reveal that high consistency does not necessarily imply shared representation, particularly for languages with different scripts. Moreover, we find that script similarity is a dominant factor in representation sharing. Finally, we observe that if LLMs could fully share knowledge across languages, their accuracy in their best-performing language could benefit an increase of up to 150\% on average. These findings highlight the need for improved multilingual knowledge representation in LLMs and suggest a path for the development of more robust and consistent multilingual LLMs.
翻译:事实性陈述的真实性在很大程度上与其书写语言无关。然而,语言模型在回答相同事实性问题时,其能力在不同语言间存在不一致性。这引发了关于大型语言模型如何在不同语言中表征给定事实的问题。我们从两个方面探究多语言事实性知识:模型在不同语言间一致回答查询的能力,以及将答案“存储”在多种语言共享表征中的能力。我们提出了一种方法,通过重新利用知识编辑技术来衡量跨语言表征共享的程度。我们使用一个新的多语言数据集,考察了具有不同多语言配置的大型语言模型。我们发现,高一致性并不一定意味着共享表征,尤其是在使用不同文字的语言之间。此外,我们发现文字相似性是表征共享的主导因素。最后,我们观察到,如果大型语言模型能够完全实现跨语言知识共享,其最佳表现语言的准确率平均可提升高达150%。这些发现突显了改进大型语言模型中多语言知识表征的必要性,并为开发更稳健、更一致的多语言大型语言模型指明了方向。