Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global (inter-sequence) representation concentration, and (ii) self-attention further amplifies this effect through local (intra-sequence) aggregation. Together, these mechanisms align hidden representations along a direction determined solely by the random initialization, producing highly non-uniform next-token predictions. Beyond mechanistic insight, we demonstrate that these initialization-induced biases persist throughout training, forming a stable and intrinsic model identity. Leveraging this property, we introduce SeedPrint, a fingerprinting method that can reliably distinguish models that differ only in their random initialization, even after extensive training and under substantial distribution shift. Finally, we identify a fundamental positional discrepancy inherent to the attention mechanism's intra-sequence contraction that is causally linked to the attention-sink phenomenon. This discovery provides a principled explanation for the emergence of sinks and offers a pathway for their control.
翻译:Transformer 是现代大语言模型(LLM)的基础,通常被认为在随机初始化时行为上是无结构的,所有有意义的偏好都仅通过大规模训练才得以显现。我们通过证明随机初始化的 Transformer 已经表现出强烈且系统的结构偏置,来挑战这一假设。具体而言,未经训练的模型显示出极端的令牌偏好:在随机输入序列中,某些令牌被预测的概率比其他令牌高出数个数量级。我们通过剖析初始化时的 Transformer 架构,为这一现象提供了机制性解释。我们表明,极端令牌偏好源于令牌表示沿着一个依赖于随机种子的方向发生收缩。这种收缩由两种相互作用的力量驱动:(i)MLP 子层中的非对称非线性激活导致全局(序列间)表示集中,以及(ii)自注意力机制通过局部(序列内)聚合进一步放大此效应。这些机制共同作用,将隐藏表示沿着仅由随机初始化决定的方向对齐,从而产生高度不均匀的下一个令牌预测。除了机制性见解,我们还证明这些由初始化引起的偏置在整个训练过程中持续存在,形成了一种稳定且内在的模型身份。利用这一特性,我们引入了 SeedPrint,一种指纹识别方法,即使在经过大量训练和面临显著分布偏移的情况下,也能可靠地区分仅随机初始化不同的模型。最后,我们识别了注意力机制固有的序列内收缩所导致的基本位置差异,这种差异与注意力汇现象存在因果关联。这一发现为汇的出现提供了一个原理性的解释,并为其控制提供了一条途径。