Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
翻译:多语言大规模预训练语言模型(PLMs)已被证明存储了大量事实知识,但不同语言之间观察到了显著差异。以最终确保不同语言背景的用户能从同一模型中获得一致反馈为目标,我们研究了多种多语言PLMs中事实知识的跨语言一致性(CLC)。为此,我们提出了一种基于排名的跨语言一致性(RankC)度量,以独立于准确率的方式评估知识在不同语言间的一致性。利用该度量,我们从模型层面和语言对层面深入分析了影响跨语言一致性的决定性因素。在众多结果中,我们发现增加模型规模会提高大多数语言的事实探针准确率,但并未改善跨语言一致性。最后,我们通过模型编辑在PLMs中插入新事实关联,进行了跨语言一致性的案例研究。对注入英语的一小部分事实样本的结果显示了一种清晰模式:新知识仅能迁移至与英语具有高RankC评分的语言。