Deep learning models, such as wide neural networks, can be conceptualized as nonlinear dynamical physical systems characterized by a multitude of interacting degrees of freedom. Such systems in the infinite limit, tend to exhibit simplified dynamics. This paper delves into gradient descent-based learning algorithms, that display a linear structure in their parameter dynamics, reminiscent of the neural tangent kernel. We establish this apparent linearity arises due to weak correlations between the first and higher-order derivatives of the hypothesis function, concerning the parameters, taken around their initial values. This insight suggests that these weak correlations could be the underlying reason for the observed linearization in such systems. As a case in point, we showcase this weak correlations structure within neural networks in the large width limit. Exploiting the relationship between linearity and weak correlations, we derive a bound on deviations from linearity observed during the training trajectory of stochastic gradient descent. To facilitate our proof, we introduce a novel method to characterise the asymptotic behavior of random tensors.
翻译:深度学习模型(如宽神经网络)可被概念化为具有大量相互作用自由度的非线性动态物理系统。此类系统在极限条件下往往呈现简化动力学。本文研究基于梯度下降的学习算法,其参数动力学表现出线性结构,该结构令人联想到神经正切核。我们证实,这种表观线性源于假设函数关于参数的一阶与高阶导数在初始值附近的弱相关性。这一发现表明,弱相关性可能是此类系统观测到线性化的潜在原因。作为典型案例,我们在宽度趋于无穷大的神经网络中展示了这种弱相关结构。利用线性与弱相关性之间的关系,我们推导出随机梯度下降训练轨迹中线性偏差的界限。为辅助证明,我们引入了一种刻画随机张量渐近行为的新方法。