Distribution data refers to a data set where each sample is represented as a probability distribution, a subject area receiving burgeoning interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, utilizing the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem's analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.
翻译:分布数据是指每个样本表示为概率分布的数据集,这一领域在统计学界正获得日益增长的关注。尽管已有若干研究针对单变量变量建立了分布到分布的回归模型,但由于技术复杂性,多变量场景仍未被充分探索。在本研究中,我们利用Wasserstein度量,引入了从一个高斯分布到另一个高斯分布的回归模型。这些模型基于Wasserstein空间的几何结构构建,使得高斯分布能够转化为线性矩阵空间的组成部分。由于采用了线性回归框架,我们的模型直观易懂,且因高斯分布间最优传输问题的解析解而简化了实现过程。我们还探讨了模型向非高斯情形的推广。我们建立了模型中经验风险最小化的样本内预测误差的收敛速率。在对比模拟实验中,我们的模型相较一种将高斯分布转化为矩阵的简单替代方法表现出更优性能。我们利用天气数据展示了该方法的应用以作说明。