Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries -- a broad subclass of continuous symmetries present in the model architecture or loss function -- interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a ``Noether flow") of the parameters $\theta$ along the degenerate direction to a unique initialization-independent fixed point $\theta^*$. These points are referred to as the {\it noise equilibria} because, at these points, noise contributions from different directions are balanced and aligned. Then, we show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks and have practical implications for understanding techniques like representation normalization and warmup.
翻译:对称性在深度学习中普遍存在,并能显著影响神经网络的学习动态。本文研究了指数对称性——模型架构或损失函数中广泛存在的一类连续对称性——与随机梯度下降(SGD)之间的相互作用。我们首先证明梯度噪声会沿着退化方向驱动参数 $\theta$ 产生系统性漂移(即“诺特流”),最终收敛至与初始化无关的唯一不动点 $\theta^*$。这些点被称为{\it 噪声平衡点},因为在该点处,来自不同方向的噪声贡献达到平衡并对齐。随后,我们证明梯度噪声的这种平衡与对齐机制,能够作为一种新颖的替代性原理解释神经网络中的渐进锐化/平坦化及表征形成等重要现象,并对理解表征归一化与预热等技术具有实际意义。