Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalence in applied Gaussian process modeling.
翻译:在宽神经网络中执行梯度下降等价于计算具有神经正切核的高斯过程(NTK-GP)的后验均值,该过程对应特定的先验均值且观测噪声为零。然而,现有表述存在两个局限性:(i)NTK-GP假设目标无噪声,导致在含噪数据上存在模型误设;(ii)该等价性无法推广至任意先验均值,而这对构建良设定模型至关重要。针对问题(i),我们在训练目标中引入正则化项,证明其等价于在NTK-GP中纳入观测噪声。针对问题(ii),我们提出一种\textit{平移网络},该结构支持任意先验均值,并允许通过单网络的梯度下降获得后验均值,无需集成或核矩阵求逆。我们通过跨数据集与架构的实验验证了理论结果,表明该方法消除了在应用高斯过程建模中实际运用NTK-GP等价性的关键障碍。