We provide novel information-theoretic generalization bounds for stochastic gradient Langevin dynamics (SGLD) under the assumptions of smoothness and dissipativity, which are widely used in sampling and non-convex optimization studies. Our bounds are time-independent and decay to zero as the sample size increases, regardless of the number of iterations and whether the step size is fixed. Unlike previous studies, we derive the generalization error bounds by focusing on the time evolution of the Kullback--Leibler divergence, which is related to the stability of datasets and is the upper bound of the mutual information between output parameters and an input dataset. Additionally, we establish the first information-theoretic generalization bound when the training and test loss are the same by showing that a loss function of SGLD is sub-exponential. This bound is also time-independent and removes the problematic step size dependence in existing work, leading to an improved excess risk bound by combining our analysis with the existing non-convex optimization error bounds.
翻译:本文针对随机梯度Langevin动力学(SGLD)在光滑性和耗散性假设下——这些假设广泛用于采样和非凸优化研究中——提供了新颖的信息论泛化界。我们的泛化界具有时间无关性,且随着样本量增加而衰减至零,无论迭代次数多少以及步长是否固定。与以往研究不同,我们通过聚焦于Kullback-Leibler散度的时间演化来推导泛化误差界,该散度与数据集的稳定性相关,并且是输出参数与输入数据集之间互信息的上界。此外,通过证明SGLD的损失函数是次指数的,我们建立了在训练损失和测试损失相同时的首个信息论泛化界。该泛化界同样具有时间无关性,消除了现有研究中棘手的步长依赖问题,结合我们对非凸优化误差的现有分析,可得到改进的过量风险上界。