In this paper, we study the quantitative convergence of shallow neural networks trained via gradient descent to their associated Gaussian processes in the infinite-width limit. While previous work has established qualitative convergence under broad settings, precise, finite-width estimates remain limited, particularly during training. We provide explicit upper bounds on the quadratic Wasserstein distance between the network output and its Gaussian approximation at any training time $t \ge 0$, demonstrating polynomial decay with network width. Our results quantify how architectural parameters, such as width and input dimension, influence convergence, and how training dynamics affect the approximation error.
翻译:本文研究了浅层神经网络在无限宽度极限下通过梯度下降训练到其关联高斯过程的定量收敛性。尽管先前的研究已在广泛设定下建立了定性收敛,但精确的有限宽度估计仍然有限,特别是在训练过程中。我们给出了网络输出与其高斯近似之间在任何训练时间$t \ge 0$时的二次Wasserstein距离的显式上界,证明了其随网络宽度呈多项式衰减。我们的结果量化了架构参数(如宽度和输入维度)如何影响收敛性,以及训练动态如何影响近似误差。