Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? However, these studies have mainly focused on BERT and the English language. In this paper, we investigate similar questions regarding encoding capability and robustness for 8 linguistic properties across 13 different perturbations in 6 Indic languages, using 9 multilingual Transformer models (7 universal and 2 Indic-specific). To conduct this study, we introduce a novel multilingual benchmark dataset, IndicSentEval, containing approximately $\sim$47K sentences. Surprisingly, our probing analysis of surface, syntactic, and semantic properties reveals that while almost all multilingual models demonstrate consistent encoding performance for English, they show mixed results for Indic languages. As expected, Indic-specific multilingual models capture linguistic properties in Indic languages better than universal models. Intriguingly, universal models broadly exhibit better robustness compared to Indic-specific models, particularly under perturbations such as dropping both nouns and verbs, dropping only verbs, or keeping only nouns. Overall, this study provides valuable insights into probing and perturbation-specific strengths and weaknesses of popular multilingual Transformer-based models for different Indic languages. We make our code and dataset publicly available [https://tinyurl.com/IndicSentEval}].
翻译:基于Transformer的模型已经彻底改变了自然语言处理领域。为了理解其卓越性能的原因并评估其可靠性,多项研究聚焦于以下问题:这些模型编码了哪些语言特性?编码程度如何?当输入文本受到扰动时,这些模型在编码语言特性方面具有怎样的鲁棒性?然而,现有研究主要集中于BERT模型和英语。本文针对6种印度语言,在13种不同扰动条件下,使用9种多语言Transformer模型(7种通用模型和2种印度语言专用模型),对8种语言特性的编码能力与鲁棒性进行了探究。为此,我们引入了一个新颖的多语言基准数据集IndicSentEval,包含约$\sim$47K个句子。令人惊讶的是,通过对表层、句法和语义属性的探测分析发现,尽管几乎所有多语言模型对英语都表现出稳定的编码性能,但对印度语言却呈现参差不齐的结果。正如预期,印度语言专用多语言模型在捕捉印度语言特性方面优于通用模型。有趣的是,通用模型总体上展现出比印度语言专用模型更好的鲁棒性,尤其在同时删除名词和动词、仅删除动词或仅保留名词等扰动条件下。总体而言,本研究为不同印度语言下主流多语言Transformer模型在探测任务和特定扰动方面的优势与局限提供了重要见解。我们的代码与数据集已公开提供[https://tinyurl.com/IndicSentEval}]。