Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

We propose Mirror Descent Optimal Transport (MDOT), a novel method for solving discrete optimal transport (OT) problems with high precision, by unifying temperature annealing in entropic-regularized OT (EOT) with mirror descent techniques. In this framework, temperature annealing produces a sequence of EOT dual problems, whose solution gradually gets closer to the solution of the original OT problem. We solve each problem efficiently using a GPU-parallel nonlinear conjugate gradients algorithm (PNCG) that outperforms traditional Sinkhorn iterations under weak regularization. Moreover, our investigation also reveals that the theoretical convergence rate of Sinkhorn iterations can exceed existing non-asymptotic bounds when its stopping criterion is tuned in a manner analogous to MDOT. Our comprehensive ablation studies of MDOT-PNCG affirm its robustness across a wide range of algorithmic parameters. Benchmarking on 24 problem sets of size $n=4096$ in a GPU environment demonstrate that our method attains high-precision, feasible solutions significantly faster than a representative set of existing OT solvers (including accelerated gradient methods and advanced Sinkhorn variants) in both wall-clock time and number of operations. Empirical convergence rates range between $O(n^2 \varepsilon^{-1/4})$ and $O(n^2 \varepsilon^{-1})$, where $\varepsilon$ is the optimality gap. For problem sizes up to ${n=16,384}$, the empirical runtime scales as $\widetilde{O}(n^2)$ for moderate precision and as $\widetilde{O}(n^{5/2})$ at worst for high precision. These findings establish MDOT-PNCG as a compelling alternative to current OT solvers, particularly in challenging weak-regularization regimes.

翻译：我们提出了一种新颖的离散最优传输（OT）问题高精度求解方法——镜像下降最优传输（MDOT），该方法通过将熵正则化最优传输（EOT）中的温度退火技术与镜像下降技术相统一来实现。在此框架中，温度退火产生一系列EOT对偶问题，其解逐渐逼近原始OT问题的解。我们采用GPU并行非线性共轭梯度算法（PNCG）高效求解每个子问题，该算法在弱正则化条件下优于传统的Sinkhorn迭代。此外，我们的研究还发现，当Sinkhorn迭代的停止准则以类似于MDOT的方式调整时，其理论收敛速度可能超过现有的非渐近界。我们对MDOT-PNCG进行的全面消融实验证实了其在广泛算法参数范围内的鲁棒性。在GPU环境下对规模为$n=4096$的24个问题集进行基准测试表明，无论是在实际运行时间还是操作次数上，我们的方法获得高精度可行解的速度均显著快于现有代表性OT求解器（包括加速梯度方法和先进的Sinkhorn变体）。经验收敛速度介于$O(n^2 \varepsilon^{-1/4})$到$O(n^2 \varepsilon^{-1})$之间，其中$\varepsilon$为最优性间隙。对于规模高达${n=16,384}$的问题，在中等精度下经验运行时间按$\widetilde{O}(n^2)$缩放，在高精度下最坏情况按$\widetilde{O}(n^{5/2})$缩放。这些发现确立了MDOT-PNCG作为当前OT求解器的有力替代方案，尤其在具有挑战性的弱正则化场景中。