Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror symmetry of the loss function leads to a structured constraint, which becomes a favored solution when either the weight decay or gradient noise is large. As direct corollaries, we show that rescaling symmetry leads to sparsity, rotation symmetry leads to low rankness, and permutation symmetry leads to homogeneous ensembling. Then, we show that the theoretical framework can explain the loss of plasticity and various collapse phenomena in neural networks and suggest how symmetries can be used to design algorithms to enforce hard constraints in a differentiable way.
翻译:摘要:由于常见的架构设计,对称性广泛存在于当代神经网络中。本文揭示了损失函数对称性在影响(甚至决定)机器学习模型学习行为中的重要性。我们证明损失函数的每个镜像对称性都会导致一种结构化约束,当权重衰减或梯度噪声较大时,该约束会成为偏好解。作为直接推论,我们展示了重缩放对称性导致稀疏性,旋转对称性导致低秩性,置换对称性导致同质集成。进而,我们阐明该理论框架可解释神经网络中的可塑性丧失及多种坍缩现象,并提出如何利用对称性设计算法,以可微分方式强制执行硬约束。