We explore whether intrinsic symmetries of the training data lead to conserved quantities during gradient-flow training of neural networks. Under the assumption that the loss function is analytic and non-polynomial, we prove that data symmetries generically do not induce any additional integrals of motion. For mean squared error (MSE) loss, on the other hand, there are situations in which data augmentation yields extra conserved quantities. We build a framework, utilizing \emph{tensorizable networks} to describe this phenomenon. Tensorizable networks are a family of architectures whose dependence on parameters and inputs can be separated using an intermediate representation. They include linear and polynomial networks, as well as Lightning Attention.
翻译:我们探究训练数据的内在对称性是否会在神经网络的梯度流训练过程中导致守恒量。在假设损失函数为解析且非多项式的前提下,我们证明了数据对称性通常不会产生任何额外的运动积分。另一方面,对于均方误差(MSE)损失,存在数据增强可以带来额外守恒量的情形。我们构建了一个利用"可张量化网络"来描述这一现象的框架。可张量化网络是一类架构,其参数和输入的依赖关系可通过中间表示进行分离,包括线性网络、多项式网络以及Lightning Attention。