The Monge Gap: A Regularizer to Learn All Transport Maps

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the ``best'' map to morph a continuous measure in $\mathcal{P}(\Rd)$ into another must be the gradient of a convex function. To exploit that result, [Makkuva+ 2020, Korotin+2020] consider maps $T=\nabla f_\theta$, where $f_\theta$ is an input convex neural network (ICNN), as defined by Amos+2017, and fit $\theta$ with SGD using samples. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges, due notably to the many constraints imposed on $\theta$; the need to approximate the conjugate of $f_\theta$; or the limitation that they only work for the squared-Euclidean cost. More generally, we question the relevance of using Brenier's result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$. That gap quantifies how far a map $T$ deviates from the ideal properties we expect from a $c$-OT map. In practice, we drop all architecture requirements for $T$ and simply minimize a distance (e.g., the Sinkhorn divergence) between $T\sharp\mu$ and $\nu$, regularized by $\mathcal{M}^c_\rho(T)$. We study $\mathcal{M}^c_{\rho}$, and show how our simple pipeline outperforms significantly other baselines in practice.

翻译：最优传输（OT）理论已被用于机器学习中，以研究和刻画能够高效地将一个概率测度推送到另一个测度上的映射。近期研究从布雷尼耶定理中汲取灵感，该定理指出，当基础成本为平方欧氏距离时，将一个连续测度（在 $\mathcal{P}(\Rd)$ 中）形变为另一个测度的“最优”映射必须是凸函数的梯度。为利用这一结果，[Makkuva+2020, Korotin+2020] 考虑了映射 $T=\nabla f_\theta$，其中 $f_\theta$ 是如同 Amos+2017 所定义的输入凸神经网络（ICNN），并通过样本使用随机梯度下降（SGD）拟合 $\theta$。尽管具有数学上的优雅性，但使用 ICNN 拟合 OT 映射仍引发诸多挑战，主要源于对 $\theta$ 施加的众多约束、需要近似 $f_\theta$ 的共轭函数，以及该方法仅适用于平方欧氏成本的局限性。更广泛地说，我们质疑使用仅适用于密度的布雷尼耶结果来约束基于样本拟合的候选映射架构的相关性。受这些局限性的启发，我们提出了一种截然不同的 OT 映射估计方法：给定成本 $c$ 和参考测度 $\rho$，我们引入一个正则化器，即映射 $T$ 的蒙日间隙 $\mathcal{M}^c_{\rho}(T)$。该间隙量化了映射 $T$ 偏离我们对 $c$-OT 映射所期望的理想属性的程度。在实际中，我们放弃了对 $T$ 的所有架构要求，仅最小化 $T\sharp\mu$ 与 $\nu$ 之间的距离（例如，Sinkhorn 散度），并以 $\mathcal{M}^c_\rho(T)$ 进行正则化。我们研究了 $\mathcal{M}^c_{\rho}$，并展示了我们的简单流程如何在实践中显著优于其他基线方法。