Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
翻译:多语言大规模预训练语言模型(PLMs)已被证明存储了大量事实知识,但在不同语言之间观察到显著差异。为确保不同语言背景的用户从同一模型获得一致的反馈,我们研究了多语言PLMs中事实知识的跨语言一致性(CLC)。为此,我们提出了一种基于排序的一致性(RankC)指标,用于独立于准确率评估知识在不同语言中的一致性。利用该指标,我们从模型层面和语言对层面对CLC的决定因素进行了深入分析。研究结果之一表明,增大模型规模可提高大多数语言的事实探测准确率,但并未改善跨语言一致性。最后,我们通过模型编辑在PLMs中插入新事实关联,进行了CLC的案例研究。在英语中插入少量事实样本后,结果显示一种明确模式:新知识仅会转移到与英语具有高RankC分数的语言中。