Language models (LMs) are capable of acquiring elements of human-like syntactic knowledge. Targeted syntactic evaluation tests have been employed to measure how well they form generalizations about syntactic phenomena in high-resource languages such as English. However, we still lack a thorough understanding of LMs' capacity for syntactic generalizations in low-resource languages, which are responsible for much of the diversity of syntactic patterns worldwide. In this study, we develop targeted syntactic evaluation tests for three low-resource languages (Basque, Hindi, and Swahili) and use them to evaluate five families of open-access multilingual Transformer LMs. We find that some syntactic tasks prove relatively easy for LMs while others (agreement in sentences containing indirect objects in Basque, agreement across a prepositional phrase in Swahili) are challenging. We additionally uncover issues with publicly available Transformers, including a bias toward the habitual aspect in Hindi in multilingual BERT and underperformance compared to similar-sized models in XGLM-4.5B.
翻译:语言模型(LMs)能够习得类人句法知识的要素。针对性的句法评估测试已被用于衡量它们对英语等高资源语言中句法现象的泛化能力。然而,我们仍然缺乏对LMs在低资源语言中句法泛化能力的深入理解,而这些语言承载了全球大部分句法模式的多样性。在本研究中,我们为三种低资源语言(巴斯克语、印地语和斯瓦希里语)开发了针对性的句法评估测试,并用其评估了五个系列的开源多语言Transformer语言模型。我们发现,某些句法任务对LMs而言相对容易,而另一些(巴斯克语中包含间接宾语的句子中的一致性、斯瓦希里语中介词短语跨越的一致性)则具有挑战性。此外,我们还发现了公开可用Transformer模型存在的一些问题,包括多语言BERT中印地语偏向惯常体的偏见,以及XGLM-4.5B与同类规模模型相比表现不佳的情况。