Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
翻译:多语言大规模预训练语言模型已被证实存储了大量事实知识,但不同语言间存在显著差异。为实现不同语言背景的用户从同一模型获得一致反馈这一终极目标,我们研究了多种多语言预训练语言模型中事实知识的跨语言一致性。为此,我们提出一种基于排序的一致性指标(RankC),该指标可独立于准确性评估知识在不同语言间的一致性。利用该指标,我们从模型层和语言对层两个层面深入分析了跨语言一致性的决定性因素。研究结果发现:扩大模型规模能提升多数语言的事实知识探测准确率,但并未改善跨语言一致性。最后,我们通过模型编辑向预训练语言模型中插入新事实关联,对跨语言一致性进行了案例研究。在英语插入的小样本事实测试中,结果显示新知识仅会向与英语具有高RankC分数的语言迁移。