Distribution data refers to a data set where each sample is represented as a probability distribution, a subject area receiving burgeoning interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, utilizing the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem's analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.
翻译:分布数据是指每个样本以概率分布形式呈现的数据集,这一研究主题在统计学领域正受到日益广泛的关注。尽管已有若干研究针对单变量变量建立了分布到分布的回归模型,但由于技术复杂性,多变量情形仍鲜有探索。本研究引入从一种高斯分布回归到另一种高斯分布的模型,利用Wasserstein度量构建。这些模型基于Wasserstein空间的几何结构,可将高斯分布转化为线性矩阵空间的成分。得益于其线性回归框架,我们的模型直观易懂,并且由于高斯分布间最优传输问题具有解析解,其实现过程得以简化。我们还探讨了模型的推广,以涵盖非高斯情形。我们建立了模型中经验风险最小化的样本内预测误差收敛速率。在对比模拟实验中,我们的模型相比一种将高斯分布转化为矩阵的简单替代方法表现出更优性能。我们通过气象数据示例展示了所提方法的应用。