Techniques in causal analysis of language models illuminate how linguistic information is organized in LLMs. We use one such technique, AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models (mBERT and XLM-R). We train a linear classifier on a binary language identity task, to classify tokens between Language X and Language Y. Applying a counterfactual probing procedure, we use the classifier weights to project the embeddings into the null space and push the resulting embeddings either in the direction of Language X or Language Y. Then we evaluate on a masked language modeling task. We find that, given a template in Language X, pushing towards Language Y systematically increases the probability of Language Y words, above and beyond a third-party control language. But it does not specifically push the model towards translation-equivalent words in Language Y. Pushing towards Language X (the same direction as the template) has a minimal effect, but somewhat degrades these models. Overall, we take these results as further evidence of the rich structure of massive multilingual language models, which include both a language-specific and language-general component. And we show that counterfactual probing can be fruitfully applied to multilingual models.
翻译:语言模型因果分析技术揭示了大型语言模型中语言信息的组织方式。我们采用此类技术之一——反事实探测方法AlterRep,探究多语言模型(mBERT和XLM-R)的内部结构。首先在二元语言身份识别任务上训练线性分类器,用以区分语言X与语言Y的标记。应用反事实探测流程时,我们利用分类器权重将嵌入向量投影至零空间,并将所得嵌入向量分别推向语言X或语言Y方向,随后在掩码语言建模任务上进行评估。研究发现:给定语言X的模板,将嵌入向量推向语言Y方向时,系统性地提升了语言Y词汇的出现概率,其效果显著超过第三方对照语言;但该方法并未专门推动模型选择语言Y中的翻译等价词汇。当将嵌入向量推向语言X方向(与模板同向)时,仅产生微弱影响,甚至在一定程度上降低了模型性能。总体而言,我们视这些结果为大规模多语言模型复杂结构的进一步佐证——此类模型同时包含语言特定成分与语言通用成分。同时证明反事实探测可有效应用于多语言模型研究。