While the optimal transport (OT) problem was originally formulated as a linear program, the addition of entropic regularization has proven beneficial both computationally and statistically, for many applications. The Sinkhorn fixed-point algorithm is the most popular approach to solve this regularized problem, and, as a result, multiple attempts have been made to reduce its runtime using, e.g., annealing in the regularization parameter, momentum or acceleration. The premise of this work is that initialization of the Sinkhorn algorithm has received comparatively little attention, possibly due to two preconceptions: since the regularized OT problem is convex, it may not be worth crafting a good initialization, since any is guaranteed to work; secondly, because the outputs of the Sinkhorn algorithm are often unrolled in end-to-end pipelines, a data-dependent initialization would bias Jacobian computations. We challenge this conventional wisdom, and show that data-dependent initializers result in dramatic speed-ups, with no effect on differentiability as long as implicit differentiation is used. Our initializations rely on closed-forms for exact or approximate OT solutions that are known in the 1D, Gaussian or GMM settings. They can be used with minimal tuning, and result in consistent speed-ups for a wide variety of OT problems.
翻译:虽然最优传输(OT)问题最初被表述为线性规划问题,但加入熵正则化在许多应用中已被证明在计算和统计上均有益处。Sinkhorn不动点算法是求解这一正则化问题最常用的方法,因此,已有多种尝试通过退火正则化参数、动量或加速等方法来减少其运行时间。本文的研究前提是,Sinkhorn算法的初始化问题受到的关注相对较少,这可能源于两种先入为主的观念:由于正则化OT问题是凸的,精心设计一个好的初始化可能并不值得,因为任何初始化都能保证收敛;其次,由于Sinkhorn算法的输出通常在端到端流水线中被展开,数据依赖的初始化会偏置雅可比矩阵的计算。我们挑战这一传统观点,并证明数据依赖的初始化能显著加速计算,只要使用隐式微分,就不会影响可微性。我们的初始化方法依赖于1维、高斯或高斯混合模型(GMM)设置下已知的精确或近似OT解的闭式表达式。它们可以以最少的调参使用,并在广泛的各种OT问题中实现一致的加速效果。