Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.
翻译:守恒律在欧几里得梯度流动力学中已有完善研究,尤其是在线性或ReLU神经网络训练中。然而,对于非欧几里得几何和基于动量的动力学,守恒律的存在性及原理仍基本未知。本文刻画了这类一般设置中的“所有”守恒律。与梯度流情形形成鲜明对比的是,我们证明基于动量的动力学中的守恒律呈现时间依赖性。此外,我们常观察到从梯度流向动量动力学转变时出现“守恒丢失”。具体而言,对于线性网络,我们的框架能够识别所有动量守恒律,其数量少于梯度流情形,除非在充分过参数化区域。对于ReLU网络,则不存在任何守恒律。该现象同样出现在非欧几里得度量中,例如用于非负矩阵分解(NMF):梯度流中的所有守恒律可被确定,但在动量情形下无一存续。