The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. However, all prior analyses assumed the infinite-particle or continuous-time limit, and cannot handle stochastic gradient updates. We provide an general framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and stochastic gradient approximation. To demonstrate the wide applicability of this framework, we establish quantitative convergence rate guarantees to the regularized global optimal solution under (i) a wide range of learning problems such as neural network in the mean-field regime and MMD minimization, and (ii) different gradient estimators including SGD and SVRG. Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.
翻译:平均场朗之万动力学(MFLD)是朗之万动力学的非线性推广,它引入了一个依赖于分布的漂移项,并自然产生于通过(带噪声的)梯度下降优化两层神经网络的过程中。近期研究表明,MFLD在测度空间中全局最小化一个熵正则化的凸泛函。然而,所有先前的分析均假设无限粒子或连续时间极限,无法处理随机梯度更新。我们提出了一个通用框架,用于证明MFLD中考虑有限粒子近似、时间离散化和随机梯度近似误差的均匀时间传播混沌性质。为展示该框架的广泛适用性,我们在以下条件下建立了向正则化全局最优解的定量收敛率保证:(i)广泛的学习问题,例如平均场机制下的神经网络和最大均值差异(MMD)最小化;(ii)不同的梯度估计器,包括随机梯度下降(SGD)和随机方差缩减梯度(SVRG)。尽管我们的结果具有普适性,但在特化为标准朗之万动力学时,在SGD和SVRG设置下均实现了改进的收敛速率。