The response envelope model provides substantial efficiency gains over the standard multivariate linear regression by identifying the material part of the response to the model and by excluding the immaterial part. In this paper, we propose the enhanced response envelope by incorporating a novel envelope regularization term based on a nonconvex manifold formulation. It is shown that the enhanced response envelope can yield better prediction risk than the original envelope estimator. The enhanced response envelope naturally handles high-dimensional data for which the original response envelope is not serviceable without necessary remedies. In an asymptotic high-dimensional regime where the ratio of the number of predictors over the number of samples converges to a non-zero constant, we characterize the risk function and reveal an interesting double descent phenomenon for the envelope model. A simulation study confirms our main theoretical findings. Simulations and real data applications demonstrate that the enhanced response envelope does have significantly improved prediction performance over the original envelope method, especially when the number of predictors is close to or moderately larger than the number of samples. Proofs and additional simulation results are shown in the supplementary file to this paper.
翻译:响应包络模型通过识别响应中与模型相关的实质性部分并排除非实质性部分,相比标准多元线性回归实现了显著的效率提升。本文提出一种增强型响应包络模型,其基于非凸流形表述引入了一种新颖的包络正则化项。研究表明,增强响应包络能够产生比原始包络估计量更优的预测风险。该模型天然适用于高维数据处理场景,而原始响应包络模型在此类场景中若不进行必要修正则无法有效应用。在预测变量数量与样本数量之比收敛于非零常数的渐近高维体系中,我们刻画了风险函数并揭示了包络模型中一种有趣的双下降现象。模拟研究验证了我们的主要理论发现。仿真与真实数据应用表明,增强响应包络模型确实较原始包络方法具有显著提升的预测性能,尤其在预测变量数量接近或适度超过样本数量的情况下尤为明显。证明过程与补充模拟结果详见本文附录文件。