Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at https://github.com/tml-epfl/sam-low-rank-features.
翻译:锐度感知最小化(SAM)是一种最近提出的方法,旨在最小化神经网络训练损失函数的锐度。虽然其泛化性能提升广为人知且是该方法的原始动机,但我们揭示了SAM的另一个有趣效果:在神经网络的不同层中均会出现特征秩的降低。我们证明这种低秩效应具有广泛适用性:对于全连接网络、卷积网络、视觉Transformer等不同架构,以及回归、分类、语言-图像对比训练等不同目标函数均成立。为深入理解这一现象,我们针对简单的双层网络提供了低秩特征形成的机理分析。研究发现,大量激活函数被SAM完全裁剪,这直接导致了秩的降低。我们通过理论分析验证了这一效应,并确认该效应同样可发生于深度网络中,尽管对于具有预激活跳跃连接和自注意力层的深层网络,其整体秩降低机制可能更为复杂。相关代码已开源在 https://github.com/tml-epfl/sam-low-rank-features。