Random Coordinate Descent on the Wasserstein Space of Probability Measures

Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from high computational overhead in high-dimensional or ill-conditioned settings. We propose a randomized coordinate descent framework specifically designed for the Wasserstein manifold, introducing both Random Wasserstein Coordinate Descent (RWCD) and Random Wasserstein Coordinate Proximal{-Gradient} (RWCP) for composite objectives. By exploiting coordinate-wise structures, our methods adapt to anisotropic objective landscapes where full-gradient approaches typically struggle. We provide a rigorous convergence analysis across various landscape geometries, establishing guarantees under non-convex, Polyak-Łojasiewicz, and geodesically convex conditions. Our theoretical results mirror the classic convergence properties found in Euclidean space, revealing a compelling symmetry between coordinate descent on vectors and on probability measures. The developed techniques are inherently adaptive to the Wasserstein geometry and offer a robust analytical template that can be extended to other optimization solvers within the space of measures. Numerical experiments on ill-conditioned energies demonstrate that our framework offers significant speedups over conventional full-gradient methods.

翻译：在配备Wasserstein-2几何结构的概率测度空间上进行优化，是现代机器学习与平均场建模的核心问题。然而，传统依赖全Wasserstein梯度的方法在高维或病态条件下常面临高计算开销。我们针对Wasserstein流形提出了一种随机化坐标下降框架，分别引入随机Wasserstein坐标下降法（RWCD）和随机Wasserstein坐标近端-梯度法（RWCP）用于处理复合目标函数。通过利用坐标方向结构，我们的方法能够适应全梯度方法通常难以处理的各向异性目标地形。我们针对不同地形几何结构提供了严格的收敛性分析，在非凸、Polyak-Łojasiewicz以及测地凸条件下建立了理论保证。理论结果复现了欧氏空间中经典的收敛性质，揭示了向量坐标下降与概率测度坐标下降之间引人注目的对称性。所发展的技术天然适应Wasserstein几何结构，并提供了一个稳健的分析模板，可推广至测度空间内的其他优化求解器。针对病态能量的数值实验表明，我们的框架相比传统全梯度方法能显著加速。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

【简明书册】(随机)梯度方法的收敛定理手册，68页pdf

专知会员服务

39+阅读 · 2023年1月31日

几何观点下的深度学习

专知会员服务

36+阅读 · 2022年12月13日

【干货书】随机优化方法在工程与运筹学中的应用，368页pdf

专知会员服务

77+阅读 · 2022年9月27日

【干货书】优化与学习的随机梯度技术，238页pdf

专知会员服务

54+阅读 · 2021年11月22日