Conditional simulation is a fundamental task in statistical modeling: Generate samples from the conditionals given finitely many data points from a joint distribution. One promising approach is to construct conditional Brenier maps, where the components of the map pushforward a reference distribution to conditionals of the target. While many estimators exist, few, if any, come with statistical or algorithmic guarantees. To this end, we propose a non-parametric estimator for conditional Brenier maps based on the computational scalability of \emph{entropic} optimal transport. Our estimator leverages a result of Carlier et al. (2010), which shows that optimal transport maps under a rescaled quadratic cost asymptotically converge to conditional Brenier maps; our estimator is precisely the entropic analogues of these converging maps. We provide heuristic justifications for choosing the scaling parameter in the cost as a function of the number of samples by fully characterizing the Gaussian setting. We conclude by comparing the performance of the estimator to other machine learning and non-parametric approaches on benchmark datasets and Bayesian inference problems.
翻译:条件模拟是统计建模中的一项基本任务:在给定联合分布中有限数据点的前提下,从条件分布中生成样本。一种具有前景的途径是构建条件Brenier映射,其映射分量将参考分布推前至目标分布的条件分布。尽管已有多种估计器存在,但极少(甚至没有)能提供统计或算法保证。为此,我们基于**熵**最优传输的计算可扩展性,提出了一种用于条件Brenier映射的非参数估计器。我们的估计器利用了Carlier等人(2010)的研究结果,该结果表明在重新标度的二次成本下,最优传输映射渐近收敛于条件Brenier映射;我们的估计器正是这些收敛映射的熵类比。通过完整刻画高斯场景,我们为将成本中的标度参数选择为样本数量的函数提供了启发式论证。最后,我们在基准数据集和贝叶斯推断问题上,将该估计器的性能与其他机器学习及非参数方法进行了比较。