In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.
翻译:在高维回归框架下,我们研究了朴素两步法带来的后果:首先对输入变量进行降维,随后将降维后的输入变量用于核回归以预测输出变量。为分析由此产生的回归误差,我们推导出核回归关于Wasserstein距离的一个新的稳定性结果。该结果使我们能够对使用扰动输入数据拟合回归函数时产生的误差进行界定。我们将这一通用稳定性结果应用于主成分分析(PCA)。利用主成分分析和核回归领域文献中已有的估计结果,我们推导出该两步法的收敛速率。后者在半监督场景中尤其具有实用价值。