The impressive achievements of transformers force NLP researchers to delve into how these models represent the underlying structure of natural language. In this paper, we propose a novel standpoint to investigate the above issue: using typological similarities among languages to observe how their respective monolingual models encode structural information. We aim to layer-wise compare transformers for typologically similar languages to observe whether these similarities emerge for particular layers. For this investigation, we propose to use Centered Kernel Alignment to measure similarity among weight matrices. We found that syntactic typological similarity is consistent with the similarity between the weights in the middle layers, which are the pretrained BERT layers to which syntax encoding is generally attributed. Moreover, we observe that a domain adaptation on semantically equivalent texts enhances this similarity among weight matrices.
翻译:Transformer模型的卓越成就促使自然语言处理研究者深入探究这些模型如何表征自然语言的底层结构。本文提出一种新视角来研究上述问题:利用语言间的类型学相似性,观察各自单语模型如何编码结构信息。我们旨在按层比较类型学相似语言的Transformer模型,以观察特定层是否出现这种相似性。为此,我们提出使用中心核对齐(Centered Kernel Alignment)来衡量权重矩阵间的相似度。研究发现,句法类型学相似性与中间层权重相似性具有一致性——这些中间层正是预训练BERT模型中被普遍认为编码句法信息的层。此外,我们观察到对语义等价文本进行领域自适应可增强权重矩阵间的这种相似性。