We consider the optimization problem of minimizing a functional defined over a family of probability distributions, where the objective functional is assumed to possess a variational form. Such a distributional optimization problem arises widely in machine learning and statistics, with Monte-Carlo sampling, variational inference, policy optimization, and generative adversarial network as examples. For this problem, we propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent over the manifold of probability distributions via iteratively pushing a set of particles. Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles. Specifically, in each iteration of variational transport, we first solve the variational problem associated with the objective functional using the particles, whose solution yields the Wasserstein gradient direction. Then we update the current distribution by pushing each particle along the direction specified by such a solution. By characterizing both the statistical error incurred in estimating the Wasserstein gradient and the progress of the optimization algorithm, we prove that when the objective function satisfies a functional version of the Polyak-\L{}ojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly to the global minimum of the objective functional up to a certain statistical error, which decays to zero sublinearly as the number of particles goes to infinity.
翻译:本文考虑在概率分布族上最小化泛函的优化问题,其中目标泛函被假定具有变分形式。这类分布优化问题广泛出现在机器学习和统计学中,例如蒙特卡洛采样、变分推断、策略优化和生成对抗网络。针对该问题,我们提出一种新颖的基于粒子的算法,称之为变分输运,该算法通过迭代推动一组粒子来近似执行概率分布流形上的Wasserstein梯度下降。具体而言,我们证明沿着关于二阶Wasserstein距离的泛函梯度方向沿测地线移动等价于对概率分布施加前推映射,该映射可通过推动一组粒子精确近似。在变分输运的每次迭代中,我们首先利用粒子求解与目标泛函相关的变分问题,其解给出Wasserstein梯度方向;然后通过沿该解指定的方向推动每个粒子来更新当前分布。通过表征估计Wasserstein梯度时产生的统计误差以及优化算法的进度,我们证明:当目标函数满足函数形式的Polyak-Lojasiewicz(PL)(Polyak, 1963)条件和光滑性条件时,变分输运线性收敛至目标泛函的全局最小值(仅存在一定的统计误差),该误差随粒子数趋于无穷而以亚线性速率衰减至零。