The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity of the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality, we establish that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.
翻译:邻近算法是在一般度量空间中最小化非线性和非光滑泛函的强大工具。受近期关于平均场机制下双层神经网络噪声梯度下降算法训练动态研究进展的启发,本文在不假定目标泛函满足测地凸性的条件下,对通用Wasserstein邻近算法的收敛性提供了简洁且自洽的分析。在Euclidean空间中Polyak-Łojasiewicz不等式的自然Wasserstein类比假设下,我们证明了邻近算法具有无偏且线性的收敛速率。相较于强测地凸性条件下求解Wasserstein梯度流的邻近算法,我们的收敛速率有所改进。我们还将分析拓展至适用于测地半凸目标函数的不精确邻近算法。数值实验中,邻近训练在平均场神经网络上展现出比噪声梯度下降算法更快的收敛速度。