We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb R^d$, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution $Q$ is a discrete measure supported on a finite number of points in $\mathbb R^d$. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate $n^{-1/2}$, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems.
翻译:我们考虑基于独立同分布样本估计两个概率分布 $P$ 和 $Q$(定义于 $\mathbb R^d$ 上)之间最优传输映射的问题。现有对该问题的所有统计分析均要求传输映射满足 Lipschitz 条件,这一强假设特别排除了传输映射非连续的所有情形。作为发展非连续映射估计方法的第一步,我们考虑数据分布 $Q$ 为支撑于 $\mathbb R^d$ 中有限个点的离散测度这一重要特例。我们研究由 Pooladian 与 Niles-Weed (2021) 最初提出的基于熵正则化最优传输的高效计算估计量,并在半离散设定下证明其以与维数无关的极小化极大最优速率 $n^{-1/2}$ 收敛。其他标准映射估计技术在该设定下既缺乏有限样本保证,又受到维数灾难的困扰。我们通过数值实验验证了这些结果,并提供了理论未涵盖的其他设定下的实验,表明熵估计量是解决其他非连续传输映射估计问题的有前景方法。