In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
翻译:本研究探讨了Transformer架构中嵌入向量的各向异性动力学与内在维度特征,重点分析了编码器与解码器之间的差异性。研究发现,Transformer解码器的各向异性分布呈现独特的钟形曲线,中层区域的各向异性浓度最高。这一模式与编码器中观测到的更为均匀的各向异性分布形成鲜明对比。此外,我们发现在训练初期,嵌入向量的内在维度会增大,表明其向更高维空间扩展;随后在训练末期进入压缩阶段,维度降低,提示模型向更紧凑的表征形式进化。这些发现为理解编码器与解码器的嵌入属性提供了全新视角。