We present an optimal transport framework for performing regression when both the covariate and the response are probability distributions on a compact Euclidean subset $\Omega\subset\mathbb{R}^d$, where $d>1$. Extending beyond compactly supported distributions, this method also applies when both the predictor and responses are Gaussian distributions on $\mathbb{R}^d$. Our approach generalizes an existing transportation-based regression model to higher dimensions. This model postulates that the conditional Fr\'echet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish an upper bound for the rate of convergence of a plug-in estimator. We propose an iterative algorithm for computing the estimator, which is based on DC (Difference of Convex Functions) Programming. In the Gaussian case, the estimator achieves a parametric rate of convergence, and the computation of the estimator simplifies to a finite-dimensional optimization over positive definite matrices, allowing for an efficient solution. The performance of the estimator is demonstrated in a simulation study.
翻译:本文提出了一种最优传输框架,用于处理当协变量和响应变量均为紧欧几里得子集 $\Omega\subset\mathbb{R}^d$ (其中 $d>1$)上概率分布时的回归问题。该框架不仅适用于紧支撑分布,还可推广至协变量和响应均为 $\mathbb{R}^d$ 上高斯分布的情形。我们的方法将现有基于运输的回归模型推广至高维空间——该模型假设响应分布的条件弗雷歇均值通过最优传输映射与协变量分布相关联。我们建立了插件估计量收敛速度的上界,并提出了一种基于DC(凸函数差)规划的迭代算法用于计算该估计量。在高斯情形下,估计量达到参数收敛速度,且其计算简化为正定矩阵上的有限维优化问题,从而可实现高效求解。通过模拟研究验证了该估计量的性能。