Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror-reflection symmetry, with reflection surface $O$, in the loss function leads to the emergence of a constraint on the model parameters $\theta$: $O^T\theta =0$. This constrained solution becomes satisfied when either the weight decay or gradient noise is large. Common instances of mirror symmetries in deep learning include rescaling, rotation, and permutation symmetry. As direct corollaries, we show that rescaling symmetry leads to sparsity, rotation symmetry leads to low rankness, and permutation symmetry leads to homogeneous ensembling. Then, we show that the theoretical framework can explain intriguing phenomena, such as the loss of plasticity and various collapse phenomena in neural networks, and suggest how symmetries can be used to design an elegant algorithm to enforce hard constraints in a differentiable way.
翻译:由于常见的架构设计,对称性广泛存在于当代神经网络中。本研究揭示了损失函数对称性在影响(若非决定)机器学习模型学习行为方面的重要性。我们证明损失函数中的每个镜像反射对称性(反射面为$O$)都会导致模型参数$\theta$出现约束条件:$O^T\theta =0$。当权重衰减或梯度噪声较大时,该约束解会得到满足。深度学习中的常见镜像对称实例包括重缩放对称、旋转对称和置换对称。作为直接推论,我们证明重缩放对称导致稀疏性,旋转对称导致低秩性,置换对称导致同质集成。随后,我们展示该理论框架能够解释神经网络中的有趣现象,如可塑性丧失和各种坍缩现象,并提出如何利用对称性设计优雅算法以可微分方式强制执行硬约束。