We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb R^d$, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution $Q$ is a discrete measure supported on a finite number of points in $\mathbb R^d$. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate $n^{-1/2}$, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems.
翻译:考虑基于独立同分布样本估计两个概率分布$P$和$Q$(定义在$\mathbb R^d$上)之间最优传输映射的问题。该问题的所有现有统计分析均要求传输映射满足Lipschitz条件,这一强假设尤其排除了传输映射存在间断的任何实例。作为发展间断映射估计方法的第一步,我们考虑数据分布$Q$为支撑在$\mathbb R^d$有限个点上的离散测度这一重要特例。研究由Pooladian与Niles-Weed(2021)最初提出的基于熵最优传输的计算高效估计量,并证明在半离散设定下该估计量以与维度无关的极小极大最优速率$n^{-1/2}$收敛。其他标准映射估计技术在该设定下既缺乏有限样本保证,又必然遭受维度灾难。我们通过数值实验验证了这些结果,并针对理论未涵盖的其他设定进行实验,表明熵估计量是解决其他间断传输映射估计问题的有前景方法。