In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
翻译:本研究探讨了Transformer架构中嵌入向量的各向异性动态变化及内在维度特性,重点关注编码器与解码器之间的差异。研究发现,Transformer解码器中的各向异性轮廓呈现独特的钟形曲线,中层区域的各向异性浓度最高。这一模式与编码器中更为均匀分布的各向异性特征形成鲜明对比。此外,我们观察到嵌入向量的内在维度在训练初期呈现增长趋势,表明模型向高维空间扩展;随后在训练末期进入压缩阶段,维度降低,暗示模型向更紧凑的表征形式优化。上述发现为理解编码器与解码器的嵌入特性提供了全新视角。