We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension $d$. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$, where $n$ is the sample size. This rate is better than the best-known rate of $n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) -- which employs parameter-independent Gaussian noise -- under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
翻译:我们针对非凸设置下均匀耗散性与光滑性条件中的标签噪声随机梯度下降(SGD)算法,推导了其泛化误差界。通过选取合适的半度量,我们建立了标签噪声随机梯度流的Wasserstein距离收缩现象,该收缩呈参数维度$d$的多项式依赖性。利用算法稳定性框架,我们推导了常学习率离散化算法的时间无关泛化误差界。所获得的误差界以$d$的多项式形式以及$n^{-2/3}$的速率缩放(其中$n$为样本量)。该速率优于在相似条件下针对随机梯度Langevin动力学(SGLD)——该算法采用与参数无关的高斯噪声——所建立的最佳已知速率$n^{-1/2}$。我们的分析为标签噪声的影响提供了定量见解。