Regression analysis with probability measures as input predictors and output response has recently drawn great attention. However, it is challenging to handle multiple input probability measures due to the non-flat Riemannian geometry of the Wasserstein space, hindering the definition of arithmetic operations, hence additive linear structure is not well-defined. In this work, a distribution-in-distribution-out regression model is proposed by introducing parallel transport to achieve provable commutativity and additivity of newly defined arithmetic operations in Wasserstein space. The appealing properties of the DIDO regression model can serve a foundation for model estimation, prediction, and inference. Specifically, the Fr\'echet least squares estimator is employed to obtain the best linear unbiased estimate, supported by the newly established Fr\'echet Gauss-Markov Theorem. Furthermore, we investigate a special case when predictors and response are all univariate Gaussian measures, leading to a simple close-form solution of linear model coefficients and $R^2$ metric. A simulation study and real case study in intraoperative cardiac output prediction are performed to evaluate the performance of the proposed method.
翻译:以概率测度作为输入预测变量和输出响应的回归分析近来引起了广泛关注。然而,由于Wasserstein空间非平坦的黎曼几何特性,处理多个输入概率测度颇具挑战性,这阻碍了算术运算的定义,因此加性线性结构难以良好定义。本文通过引入平行传输提出了一种分布入分布出回归模型,在Wasserstein空间中实现了新定义算术运算的可证明交换性和可加性。DIDO回归模型的这些优良特性可为模型估计、预测和推断奠定基础。具体而言,采用Fr\'echet最小二乘估计量获得最优线性无偏估计,并由新建立的Fr\'echet Gauss-Markov定理提供理论支撑。此外,我们研究了预测变量和响应均为单变量高斯测度的特殊情形,由此得到线性模型系数和$R^2$指标的简洁闭式解。通过仿真实验和术中心输出量预测的真实案例分析,对所提方法的性能进行了评估。