We develop a general mathematical framework to analyze scaling regimes and derive explicit analytic solutions for gradient flow (GF) in large learning problems. Our key innovation is a formal power series expansion of the loss evolution, with coefficients encoded by diagrams akin to Feynman diagrams. We show that this expansion has a well-defined large-size limit that can be used to reveal different learning phases and, in some cases, to obtain explicit solutions of the nonlinear GF. We focus on learning Canonical Polyadic (CP) decompositions of high-order tensors, and show that this model has several distinct extreme lazy and rich GF regimes such as free evolution, NTK and under- and over-parameterized mean-field. We show that these regimes depend on the parameter scaling, tensor order, and symmetry of the model in a specific and subtle way. Moreover, we propose a general approach to summing the formal loss expansion by reducing it to a PDE; in a wide range of scenarios, it turns out to be 1st order and solvable by the method of characteristics. We observe a very good agreement of our theoretical predictions with experiment.
翻译:我们构建了一个通用的数学框架,用于分析大规模学习问题中梯度流(GF)的缩放机制并推导其显式解析解。我们的核心创新在于将损失演化展开为形式幂级数,其系数由类似于费曼图的图结构编码。我们证明了该展开在大型极限下具有明确定义,可用于揭示不同的学习阶段,并在某些情况下获得非线性梯度流的显式解。我们聚焦于学习高阶张量的规范多线性(CP)分解,并证明该模型具有多种截然不同的极端惰性与丰富梯度流机制,例如自由演化、神经正切核(NTK)以及欠参数化与过参数化平均场。研究表明,这些机制以特定且微妙的方式依赖于参数缩放、张量阶数及模型对称性。此外,我们提出了一种通过将形式损失展开约化为偏微分方程(PDE)来求其和的通用方法;在广泛场景中,该方程可化为一阶方程并通过特征线法求解。我们的理论预测与实验观测结果高度吻合。