We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.
翻译:我们提出了一种新的非线性充分降维方法,适用于预测变量和响应变量均为分布数据(建模为度量空间中的元素)的情形。关键步骤是在度量空间上构建通用核(cc-universal),由此得到预测变量和响应变量的再生核希尔伯特空间,该空间足够丰富,能够刻画决定充分降维的条件独立性。对于单变量分布,我们利用Wasserstein距离构建通用核;对于多变量分布,则采用切片Wasserstein距离。切片Wasserstein距离确保度量空间具有与Wasserstein空间相似的拓扑性质,同时显著提升计算效率。基于合成数据的数值结果表明,我们的方法优于潜在竞争方法,并进一步应用于多个数据集,包括生育率与死亡率数据以及卡尔加里温度数据。