Diagonal linear networks (DLNs) are a tractable model that captures several nontrivial behaviors in neural network training, such as initialization-dependent solutions and incremental learning. These phenomena are typically studied in isolation, leaving the overall dynamics insufficiently understood. In this work, we present a unified analysis of various phenomena in the gradient flow dynamics of DLNs. Using Dynamical Mean-Field Theory (DMFT), we derive a low-dimensional effective process that captures the asymptotic gradient flow dynamics in high dimensions. Analyzing this effective process yields new insights into DLN dynamics, including loss convergence rates and their trade-off with generalization, and systematically reproduces many of the previously observed phenomena. These findings deepen our understanding of DLNs and demonstrate the effectiveness of the DMFT approach in analyzing high-dimensional learning dynamics of neural networks.
翻译:对角线性网络(DLNs)是一种可解析处理的模型,它捕捉了神经网络训练中若干非平凡行为,例如初始化依赖的解和增量学习。这些现象通常被孤立研究,导致对其整体动力学的理解尚不充分。本工作对DLNs梯度流动力学中的多种现象进行了统一分析。通过运用动力学平均场理论(DMFT),我们推导出一个低维有效过程,该过程能够捕捉高维情形下的渐近梯度流动力学。对此有效过程的分析为DLN动力学提供了新的见解,包括损失收敛速率及其与泛化性能的权衡,并系统性地复现了先前观测到的多种现象。这些发现深化了我们对DLNs的理解,并证明了DMFT方法在分析神经网络高维学习动力学中的有效性。