Neural tangent kernels (NTKs) are a powerful tool for analyzing deep, non-linear neural networks. In the infinite-width limit, NTKs can easily be computed for most common architectures, yielding full analytic control over the training dynamics. However, at infinite width, important properties of training such as NTK evolution or feature learning are absent. Nevertheless, finite width effects can be included by computing corrections to the Gaussian statistics at infinite width. We introduce Feynman diagrams for computing finite-width corrections to NTK statistics. These dramatically simplify the necessary algebraic manipulations and enable the computation of layer-wise recursion relations for arbitrary statistics involving preactivations, NTKs and certain higher-derivative tensors (dNTK and ddNTK) required to predict the training dynamics at leading order. We demonstrate the feasibility of our framework by extending stability results for deep networks from preactivations to NTKs and proving the absence of finite-width corrections for scale-invariant nonlinearities such as ReLU on the diagonal of the Gram matrix of the NTK. We numerically implement the complete set of equations necessary to compute the first-order corrections for arbitrary inputs and demonstrate that the results follow the statistics of sampled neural networks for widths $n\gtrsim 20$.
翻译:神经正切核(NTK)是分析深度非线性神经网络的有力工具。在无限宽度极限下,大多数常见架构的NTK易于计算,从而能完全解析控制训练动力学。然而,在无限宽度下,训练的重要特性(如NTK演化或特征学习)会消失。尽管如此,通过计算无限宽度下高斯统计的修正,可以包含有限宽度效应。我们引入费曼图来计算NTK统计的有限宽度修正。这极大简化了必要的代数操作,并能够针对涉及预激活、NTK及预测主导阶训练动力学所需的高阶导数张量(dNTK和ddNTK)的任意统计量,建立逐层递归关系。我们通过将深度网络的稳定性结果从预激活扩展到NTK,并证明在NTK的Gram矩阵对角线上,对于ReLU等尺度不变非线性函数不存在有限宽度修正,验证了本框架的可行性。我们数值实现了计算任意输入一阶修正所需的完整方程组,并证明结果遵循宽度$n\gtrsim 20$采样神经网络的统计规律。