Prediction, in regression and classification, is one of the main aims in modern data science. When the number of predictors is large, a common first step is to reduce the dimension of the data. Sufficient dimension reduction (SDR) is a well established paradigm of reduction that keeps all the relevant information in the covariates X that is necessary for the prediction of Y . In practice, SDR has been successfully used as an exploratory tool for modelling after estimation of the sufficient reduction. Nevertheless, even if the estimated reduction is a consistent estimator of the population, there is no theory that supports this step when non-parametric regression is used in the imputed estimator. In this paper, we show that the asymptotic distribution of the non-parametric regression estimator is the same regardless if the true SDR or its estimator is used. This result allows making inferences, for example, computing confidence intervals for the regression function avoiding the curse of dimensionality.
翻译:预测是回归与分类任务在现代数据科学中的核心目标之一。当预测变量数量庞大时,常见的初步步骤是对数据进行降维。充分降维(SDR)是一种成熟的降维范式,能够在保留预测Y所需的协变量X全部相关信息的同时实现维度压缩。实际应用中,SDR已成功作为探索性工具,在充分降维估计后用于建模。然而,即使估计的降维量是总体的一致估计量,当非参数回归用于插补估计量时,目前尚无理论支持这一步骤。本文证明:无论使用真实SDR还是其估计量,非参数回归估计量的渐近分布均保持一致。这一结论使得推断成为可能,例如可计算回归函数的置信区间,从而避免维数灾难的影响。