Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
翻译:多语言大规模预训练语言模型(PLMs)已被证明存储了大量事实知识,但在不同语言间观察到显著差异。以最终确保不同语言背景的用户能从同一模型获得一致反馈为目标,我们研究了多种多语言PLMs中事实知识的跨语言一致性(CLC)。为此,我们提出了一种基于排序的一致性(RankC)指标,用于在独立于准确性的条件下评估不同语言间的知识一致性。利用该指标,我们从模型层面和语言对层面深入分析了CLC的决定因素。研究结果包括:在大多数语言中,增大模型规模可提高事实探测的准确性,但并未改善跨语言一致性。最后,我们针对通过模型编辑在PLMs中插入新事实关联的情形开展了CLC案例研究。在英语中插入少量事实样本的结果揭示了一个清晰模式:新知识仅会迁移至与英语具有高RankC分数的语言。