Transformer models contain substantial internal redundancy arising from coordinate-dependent representations and continuous symmetries, in model space and in head space, respectively. While recent approaches address this by explicitly breaking symmetry, we propose a complementary framework based on symmetry reduction. We reformulate representations, attention mechanisms, and optimization dynamics in terms of invariant relational quantities, eliminating redundant degrees of freedom by construction. This perspective yields architectures that operate directly on relational structures, providing a principled geometric framework for reducing parameter redundancy and analyzing optimization.
翻译:Transformer模型在模型空间和头空间中分别存在由坐标依赖表示和连续对称性引起的大量内部冗余。尽管近期方法通过显式打破对称性来解决此问题,我们提出了一种基于对称约简的互补框架。我们以不变关系量重新表述表示机制、注意力机制和优化动力学,从而在构造上消除冗余自由度。该视角催生了直接操作于关系结构的架构,为减少参数冗余和分析优化过程提供了原则性的几何框架。