Analyzing the similarity of internal representations within and across different models has been an important technique for understanding the behavior of deep neural networks. Most existing methods for analyzing the similarity between representations of high dimensions, such as those based on Canonical Correlation Analysis (CCA) and widely used Centered Kernel Alignment (CKA), rely on statistical properties of the representations for a set of data points. In this paper, we focus on transformer models and study the similarity of representations between the hidden layers of individual transformers. In this context, we show that a simple sample-wise cosine similarity metric is capable of capturing the similarity and aligns with the complicated CKA. Our experimental results on common transformers reveal that representations across layers are positively correlated, albeit the similarity decreases when layers are far apart. We then propose an aligned training approach to enhance the similarity between internal representations, with trained models that enjoy the following properties: (1) the last-layer classifier can be directly applied right after any hidden layers, yielding intermediate layer accuracies much higher than those under standard training, (2) the layer-wise accuracies monotonically increase and reveal the minimal depth needed for the given task, (3) when served as multi-exit models, they achieve on-par performance with standard multi-exit architectures which consist of additional classifiers designed for early exiting in shallow layers. To our knowledge, our work is the first to show that one common classifier is sufficient for multi-exit models. We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training.
翻译:分析不同模型内部及跨模型的表征相似性,已成为理解深度神经网络行为的重要技术。现有大多数分析高维表征相似性的方法,例如基于典型相关分析(CCA)和广泛使用的中心核对齐(CKA)的方法,都依赖于一组数据点表征的统计特性。本文聚焦于Transformer模型,研究单个Transformer隐藏层之间表征的相似性。在此背景下,我们证明简单的逐样本余弦相似度度量能够捕捉这种相似性,并与复杂的CKA方法保持一致。我们在常见Transformer模型上的实验结果表明,各层间的表征呈正相关,尽管当层间距离较远时相似性会降低。随后,我们提出一种对齐训练方法以增强内部表征间的相似性,所训练的模型具备以下特性:(1)最后一层分类器可直接应用于任何隐藏层之后,产生的中间层准确率远高于标准训练下的结果;(2)逐层准确率单调递增,并揭示了给定任务所需的最小深度;(3)当作为多出口模型使用时,其性能与标准多出口架构相当,而后者包含专为浅层早期退出设计的额外分类器。据我们所知,我们的工作首次证明单一通用分类器足以构建多出口模型。我们在视觉和自然语言处理任务上进行了实验,以验证所提对齐训练方法的性能。