Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fréchet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.
翻译:以分布为响应变量、欧几里得空间变量为预测变量的回归模型正日益凸显其科学重要性。近年来,单变量分布数据的分析方法发展迅速,而多元分布——其额外编码了单变量边缘分布间的依赖关系——却较少受到关注,并带来计算与统计上的挑战。本研究通过一种新的多元分布响应回归方法应对这些挑战,其中分布被建模于半参数非参数正态族框架内。通过将非参数正态传输度量——Wasserstein距离的高效闭式替代度量——纳入Fréchet回归框架,本方法将问题分解为边缘分布及其依赖结构的独立回归,既实现了高效估计,又便于对预测变量效应进行细粒度解释。我们为NPT提供了理论依据,证明其与Wasserstein距离的拓扑等价性,并证实其能缓解维度灾难问题。进一步地,我们证明了回归估计量的一致收敛性保证,无论分布响应是被完全观测还是通过经验样本估计,均能获得与单变量情形相当的快速收敛速率。通过模拟实验和连续血糖监测数据的应用,验证了本方法的实用价值。