Envelope methods perform dimension reduction of predictors or responses in multivariate regression, exploiting the relationship between them to improve estimation efficiency. While most research on envelopes has focused on their estimation properties, certain envelope estimators have been shown to excel at prediction in both low and high dimensions. In this paper, we propose to further improve prediction through envelope-guided regularization (EgReg), a novel method which uses envelope-derived information to guide shrinkage along the principal components (PCs) of the predictor matrix. We situate EgReg among other PC-based regression methods and envelope methods to motivate its development. We show that EgReg delivers lower prediction risk than a closely related non-shrinkage envelope estimator when the number of predictors $p$ and observations $n$ are fixed and in any alignment. In an asymptotic regime where the true intrinsic dimension of the predictors and $n$ diverge proportionally, we find that the limiting prediction risk of the non-shrinkage envelope estimator exhibits a double descent phenomenon and is consistently larger than the limiting risk for EgReg. We compare the prediction performance of EgReg with envelope methods and other PC-based prediction methods in simulations and a real data example, observing improved prediction performance over these alternative approaches in general.
翻译:包络方法通过利用预测变量与响应变量之间的关系来提升估计效率,从而在多元回归中对预测变量或响应变量进行降维。尽管现有关于包络方法的研究主要集中于其估计特性,但某些包络估计器已被证明在低维和高维情况下均具有优异的预测性能。本文提出一种通过包络引导正则化进一步改善预测性能的新方法,该方法利用包络导出的信息来指导沿预测变量矩阵主成分方向的收缩。我们将EgReg与其他基于主成分的回归方法及包络方法进行比较,以阐明其发展动机。研究表明,当预测变量数量$p$和观测样本数$n$固定且处于任意对齐状态时,EgReg比密切相关的非收缩包络估计器具有更低的预测风险。在预测变量真实本征维数与$n$成比例发散的渐近体系中,我们发现非收缩包络估计器的极限预测风险呈现双下降现象,且始终大于EgReg的极限风险。通过模拟实验和实际数据案例,我们将EgReg与包络方法及其他基于主成分的预测方法进行比较,结果表明EgReg在总体上较这些替代方法具有更优的预测性能。