We study the problem of estimating a function $T$ given independent samples from a distribution $P$ and from the pushforward distribution $T_\sharp P$. This setting is motivated by applications in the sciences, where $T$ represents the evolution of a physical system over time, and in machine learning, where, for example, $T$ may represent a transformation learned by a deep neural network trained for a generative modeling task. To ensure identifiability, we assume that $T = \nabla \varphi_0$ is the gradient of a convex function, in which case $T$ is known as an \emph{optimal transport map}. Prior work has studied the estimation of $T$ under the assumption that it lies in a H\"older class, but general theory is lacking. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfy a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.
翻译:我们研究在给定独立样本(分别来自分布 $P$ 及其推前分布 $T_\sharp P$)的情况下,估计函数 $T$ 的问题。这一设定源于科学领域的应用——其中 $T$ 代表物理系统随时间的演化,以及机器学习领域——例如,$T$ 可能表示深度神经网络为生成建模任务所学习的变换。为保证可辨识性,我们假设 $T = \nabla \varphi_0$ 是凸函数的梯度,此时 $T$ 被称为*最优传输映射*。已有工作研究了在 $T$ 属于 Hölder 类假设下的估计问题,但缺乏一般性理论。我们提出了一套统一方法,用于在一般函数空间中获取最优传输映射的估计速率。我们的假设显著弱于文献中的要求:仅需源测度 $P$ 满足 Poincaré 不等式,且最优映射为光滑凸函数的梯度,该函数所在的函数空间之度量熵可控。作为特例,我们不仅恢复了 Hölder 传输映射的已知估计速率,还在许多以往工作未覆盖的设定中获得了近乎最优的结果。例如,当 $P$ 为正态分布且传输映射由无限宽浅层神经网络给出时,我们首次提供了统计估计速率。