Domain decomposition has been shown to be a computationally efficient distributed method for solving large scale entropic optimal transport problems. However, a naive implementation of the algorithm can freeze in the limit of very fine partition cells (i.e. it asymptotically becomes stationary and does not find the global minimizer), since information can only travel slowly between cells. In practice this can be avoided by a coarse-to-fine multiscale scheme. In this article we introduce flow updates as an alternative approach. Flow updates can be interpreted as a variant of the celebrated algorithm by Angenent, Haker, and Tannenbaum, and can be combined canonically with domain decomposition. We prove convergence to the global minimizer and provide a formal discussion of its continuity limit. We give a numerical comparison with naive and multiscale domain decomposition, and show that the hybrid method does not suffer from freezing in the regime of very many cells. While the multiscale scheme is observed to be faster than the hybrid approach in general, the latter could be a viable alternative in cases where a good initial coupling is available. Our numerical experiments are based on a novel GPU implementation of domain decomposition that we describe in the appendix.
翻译:区域分解已被证明是求解大规模熵正则化最优传输问题的一种计算高效的分布式方法。然而,该算法的朴素实现在划分单元极精细的极限情况下可能出现停滞(即渐近趋于静止态而无法找到全局极小解),因为信息在单元间的传播速度受限。实践中可通过从粗到细的多尺度方案避免此问题。本文提出流更新作为替代方法。流更新可被解释为Angenent、Haker与Tannenbaum经典算法的变体,并能与区域分解自然结合。我们证明了该方法对全局极小解的收敛性,并对其连续极限进行了形式化讨论。通过与朴素区域分解及多尺度区域分解的数值比较,表明该混合方法在单元数量极大时不会出现停滞现象。虽然多尺度方案在总体上比混合方法更快,但当存在良好初始耦合时,后者可能成为可行的替代方案。我们的数值实验基于附录中描述的新型区域分解GPU实现。