Language models (LMs) are capable of acquiring elements of human-like syntactic knowledge. Targeted syntactic evaluation tests have been employed to measure how well they form generalizations about syntactic phenomena in high-resource languages such as English. However, we still lack a thorough understanding of LMs' capacity for syntactic generalizations in low-resource languages, which are responsible for much of the diversity of syntactic patterns worldwide. In this study, we develop targeted syntactic evaluation tests for three low-resource languages (Basque, Hindi, and Swahili) and use them to evaluate five families of open-access multilingual Transformer LMs. We find that some syntactic tasks prove relatively easy for LMs while others (agreement in sentences containing indirect objects in Basque, agreement across a prepositional phrase in Swahili) are challenging. We additionally uncover issues with publicly available Transformers, including a bias toward the habitual aspect in Hindi in multilingual BERT and underperformance compared to similar-sized models in XGLM-4.5B.
翻译:语言模型(LMs)能够习得类人句法知识的要素。针对性的句法评估测试已被用于衡量它们对英语等高资源语言中句法现象形成泛化的能力。然而,我们仍然缺乏对语言模型在低资源语言中句法泛化能力的深入理解,而这些语言构成了全球句法模式多样性的主要部分。在本研究中,我们为三种低资源语言(巴斯克语、印地语和斯瓦希里语)开发了针对性的句法评估测试,并用其评估了五个系列的开放获取多语言Transformer语言模型。我们发现,某些句法任务对语言模型而言相对容易,而其他任务(巴斯克语中包含间接宾语的句子中的一致性、斯瓦希里语中介词短语跨越的一致性)则具有挑战性。此外,我们还揭示了公开可用的Transformer模型存在的一些问题,包括多语言BERT中印地语偏向于惯常体,以及XGLM-4.5B与相似规模模型相比表现不佳。