This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and comprehensively characterize the risk landscape. This limiting risk helps to reveal two phenomena: simple weighting inherits the double descent trajectory and its associated variance explosion near the interpolation boundary; strategic weighting triggers an ensemble emergence that suppresses the localized risk surge and yields a globally flat risk surface. Building on this limiting risk, we also propose the Large Model Averaging (LaMA) method, in which we consider the discrepancy between in-sample and out-of-sample risks in the high-dimensional regime. Numerical studies and real data applications confirm that LaMA achieves superior predictive accuracy in high-dimensional environments.
翻译:本文研究了在解释变量数量与样本量接近的高维线性回归中,模型平均的预测性能。借助随机矩阵理论工具,我们在嵌套模型设定下推导了精确的极限样本外风险,并全面刻画了风险景观的特征。该极限风险揭示了两种现象:简单加权继承了双重下降轨迹及其在插值边界附近伴随的方差爆炸;策略性加权则触发了集成涌现现象,抑制了局部风险激增并形成全局平坦的风险曲面。基于此极限风险,我们还提出了大模型平均(LaMA)方法,其中重点考察了高维场景下样本内与样本外风险的差异。数值实验与真实数据应用证实,LaMA在复杂高维环境下实现了卓越的预测精度。