We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p $\in$ {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function $Φ$. We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.
翻译:本文针对采用随机梯度下降(SGD)训练的双层神经网络,在二次损失与岭正则化条件下,以高概率量化了其预测结果与平均场极限之间的差异,且该估计在时间上具有一致性。核心贡献在于为SGD参数分布建立了T p输运不等式(p ∈ {1, 2}),其显式常数与迭代次数无关。在此基础上,我们证明了经验参数测度在Wasserstein距离W 1意义下围绕其平均场极限的均匀时间浓度性质,并将这些界转化为针对固定测试函数$Φ$的预测误差估计。此外,我们推导了切片Wasserstein距离SW 1下的类似浓度界,从而获得了与维度无关的收敛速率。