Distribution data refers to a data set where each sample is represented as a probability distribution, a subject area receiving burgeoning interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, utilizing the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem's analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.
翻译:分布数据是指每个样本以概率分布形式表示的数据集,这一研究领域在统计学中正受到日益增长的关注。尽管已有若干研究针对单变量变量开发了分布对分布回归模型,但由于技术复杂性,多变量场景仍鲜有探索。在本研究中,我们利用Wasserstein度量,引入了从一个高斯分布到另一个高斯分布的回归模型。这些模型基于Wasserstein空间的几何结构构建,可将高斯分布转化为线性矩阵空间的元素。得益于其线性回归框架,我们的模型直观易解,且由于高斯分布间最优传输问题的解析解,其实施过程得以简化。我们还探索了将模型推广至非高斯场景的可能性。针对模型中的经验风险最小化,我们建立了样本内预测误差的收敛速率。在对比模拟实验中,我们的模型相较于将高斯分布转化为矩阵的简单替代方法展现出更优性能。我们利用天气数据展示了方法的实际应用。