Deep learning models, particularly Transformers, are often criticized as "black boxes" and lack interpretability. We propose Prism, a white-box attention-based architecture derived from the principles of Maximizing Coding Rate Reduction ($\text{MCR}^2$). By modeling the attention mechanism as a gradient ascent process on a distinct signal-noise manifold, we introduce a specific irrational frequency separation ($π$-RoPE) to enforce incoherence between signal (semantic) and noise (syntactic) subspaces. We show empirical evidence that these geometric inductive biases can induce unsupervised functional disentanglement alone. Prism spontaneously specializes its attention heads into spectrally distinct regimes: low-frequency heads capturing long-range causal dependencies (signal) and high-frequency heads handling local syntactic constraints and structural artifacts. To provide a theoretical grounding for these spectral phenomena, we draw an analogy between attention mechanism and a Hamiltonian dynamical system and identify that the standard geometric progression of Rotary Positional Embeddings (RoPE) induces dense resonance networks (Arnold Tongues), leading to feature rank collapse. Empirical validation on 124M-parameter models trained on OpenWebText demonstrates that Prism spontaneously isolates the Attention Sink pathology and maintains isentropic information flow across layers. Further, we suggest a physics-informed plug-and-play intervention KAM-RoPE for large language models (LLMs). Our results suggest that interpretability and performance can be unified through principled geometric construction, offering a theoretically grounded alternative to heuristic architectural modifications
翻译:深度学习模型,尤其是Transformer,常被批评为“黑箱”且缺乏可解释性。我们提出Prism——一种基于最大化编码率缩减($\text{MCR}^2$)原理推导出的白盒注意力架构。通过将注意力机制建模为在特定信号-噪声流形上的梯度上升过程,我们引入了一种特定的无理频率分离($π$-RoPE)以强制信号(语义)子空间与噪声(句法)子空间之间的非相干性。我们提供了经验证据表明,仅凭这些几何归纳偏置即可诱导无监督的功能解耦。Prism能自发地将其注意力头特化为频谱分离的机制:低频头捕获长程因果依赖(信号),而高频头处理局部句法约束与结构伪影。为这些频谱现象提供理论基础,我们建立了注意力机制与哈密顿动力系统之间的类比,并指出旋转位置编码(RoPE)的标准几何级数会诱发密集共振网络(阿诺德舌),导致特征秩崩溃。在OpenWebText上训练的124M参数模型上的实证验证表明,Prism能自发隔离注意力沉疴病理,并在各层间保持等熵信息流。此外,我们提出了一种基于物理启发的即插即用干预方法KAM-RoPE,适用于大语言模型(LLMs)。我们的研究结果表明,通过原理性的几何构造可实现可解释性与性能的统一,为启发式架构修改提供了理论依据充分的替代方案。