Large language models (LLMs) encode vast amounts of world knowledge acquired via training on large web-scale datasets crawled from the internet. However, these datasets typically exhibit a geographical bias towards English-speaking Western countries. This results in LLMs producing biased or hallucinated responses to queries that require answers localized to other geographical regions. In this work, we introduce a new benchmark named LoFTI (Localization and Factuality Transfer to Indian Locales) that can be used to evaluate an LLM's localization and factual text transfer capabilities. LoFTI consists of factual statements about entities in source and target locations; the source locations are spread across the globe and the target locations are all within India with varying degrees of hyperlocality (country, states, cities). The entities span a wide variety of categories. We use LoFTI to evaluate Mixtral, GPT-4 and two other Mixtral-based approaches well-suited to the task of localized factual transfer. We demonstrate that LoFTI is a high-quality evaluation benchmark and all the models, including GPT-4, produce skewed results across varying levels of hyperlocality.
翻译:大型语言模型(LLM)通过训练从互联网爬取的大规模网络数据集,编码了海量的世界知识。然而,这些数据集通常存在地理偏见,偏向于英语系西方国家。这导致LLM在回答需要针对其他地理区域进行本地化的问题时,会产生带有偏见或虚构的回应。本研究引入了一个名为LoFTI(面向印度地区的本地化与事实性迁移)的新基准,可用于评估LLM的本地化与事实文本迁移能力。LoFTI包含关于源地点和目标地点实体的事实性陈述;源地点分布在全球各地,而所有目标地点均位于印度境内,并具有不同程度的超本地性(国家、邦、城市)。实体涵盖广泛的类别。我们使用LoFTI评估了Mixtral、GPT-4以及另外两种特别适合本地化事实迁移任务的基于Mixtral的方法。我们证明LoFTI是一个高质量的评估基准,并且所有模型(包括GPT-4)在不同超本地性水平上均产生了有偏差的结果。