Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous methods, MLEMs offer a transparent comparison, by identifying the specific linguistic features responsible for similarities and differences. More generally, the method uses formal, symbolic descriptions of a domain, and use these to compare neural representations. As such, the approach can straightforwardly be extended to other domains, such as speech and vision, and to other neural systems, including human brains.
翻译:架构差异是否显著影响模型表示和处理语言的方式?我们提出了一种基于度量学习编码模型的新方法,作为回答这一问题的初步尝试。该方法能够基于特征比较任意两个模型中任意两层如何表示语言信息。我们将此方法应用于BERT、GPT-2和Mamba模型。与先前方法不同,度量学习编码模型通过识别导致相似性与差异的具体语言特征,提供了透明的比较机制。更广泛而言,该方法利用领域的形式化符号描述来比较神经表征。因此,该方法可直接扩展至其他领域(如语音和视觉)以及其他神经系统(包括人脑)。