We develop a computationally tractable method for estimating the optimal map between two distributions over $\mathbb{R}^d$ with rigorous finite-sample guarantees. Leveraging an entropic version of Brenier's theorem, we show that our estimator -- the \emph{barycentric projection} of the optimal entropic plan -- is easy to compute using Sinkhorn's algorithm. As a result, unlike current approaches for map estimation, which are slow to evaluate when the dimension or number of samples is large, our approach is parallelizable and extremely efficient even for massive data sets. Under smoothness assumptions on the optimal map, we show that our estimator enjoys comparable statistical performance to other estimators in the literature, but with much lower computational cost. We showcase the efficacy of our proposed estimator through numerical examples, even ones not explicitly covered by our assumptions. By virtue of Lepski's method, we propose a modified version of our estimator that is adaptive to the smoothness of the underlying optimal transport map. Our proofs are based on a modified duality principle for entropic optimal transport and on a method for approximating optimal entropic plans due to Pal (2019).
翻译:我们开发了一种计算可行的方法,用于估计$\mathbb{R}^d$上两个分布之间的最优映射,并具有严格的有限样本保证。利用Brenier定理的熵版本,我们证明我们的估计量——最优熵规划的**重心投影**——可以通过Sinkhorn算法轻松计算。因此,与当前映射估计方法(在维度或样本量较大时评估速度较慢)不同,我们的方法可并行化,即便处理大规模数据集也极为高效。在对最优映射的光滑性假设下,我们证明该估计量在统计性能上与文献中的其他估计量相当,但计算成本显著降低。我们通过数值示例(包括未明确涵盖在假设中的情形)展示了所提估计量的有效性。依托Lepski方法,我们提出了一个可自适应于底层最优传输映射光滑性的改进版估计量。我们的证明基于熵最优传输的修正对偶原理以及Pal(2019)提出的近似最优熵规划方法。