The recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks, and brought new practical techniques for finding appropriate hyperparameters, learning network weights, and performing inference. In this paper, we broaden this line of research by showing that this infinite-width analysis can be extended to the Jacobian of a deep neural network. We show that a multilayer perceptron (MLP) and its Jacobian at initialisation jointly converge to a Gaussian process (GP) as the widths of the MLP's hidden layers go to infinity and characterise this GP. We also prove that in the infinite-width limit, the evolution of the MLP under the so-called robust training (i.e., training with a regulariser on the Jacobian) is described by a linear first-order ordinary differential equation that is determined by a variant of the Neural Tangent Kernel. We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
翻译:近期关于深度神经网络在其无限宽极限下的理论分析,加深了我们对这些网络初始化、特征学习及训练过程的理解,并催生了寻找合适超参数、学习网络权重及执行推断的新实用技术。本文通过证明这种无限宽分析可扩展至深度神经网络的雅可比矩阵,拓宽了这一研究方向。我们展示了多层感知机(MLP)及其初始化时的雅可比矩阵,在MLP隐藏层宽度趋于无穷时,共同收敛至一个高斯过程(GP),并刻画了该GP。我们还证明,在无限宽极限下,MLP在所谓鲁棒训练(即对雅可比施加正则化的训练)中的演化,由一阶线性常微分方程描述,该方程由神经正切核的变体决定。我们通过实验验证了理论结论在有限宽度网络中的相关性,并实证分析了核回归解的性质,以深入理解雅可比正则化。