Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the constituent estimators, and least-squares with nonnegativity constraints to learn the combination weights. In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint. When the constituent estimators are linear least-squares projections onto nested subspaces separated by at least three dimensions, we show that thanks to a shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them. Here "best" refers to an estimator that minimizes a model selection criterion such as AIC or BIC. In other words, in this setting, the best single estimator is inadmissible. Because the optimization problem can be reformulated as isotonic regression, the stacked estimator requires the same order of computation as the best single estimator, making it an attractive alternative in terms of both performance and implementation.
翻译:堆叠回归是一种集成技术,通过形成不同回归估计量的线性组合来提升预测精度。传统方法利用交叉验证数据生成各组成估计量的预测值,并采用带非负约束的最小二乘法学习组合权重。本文中,我们通过最小化受非负约束的总体风险估计值来类比学习这些权重。当组成估计量为嵌套子空间(子空间之间至少相隔三个维度)上的线性最小二乘投影时,我们证明了由于收缩效应,所得堆叠估计量的总体风险严格小于其中最佳单一估计量。这里的“最佳”指最小化AIC或BIC等模型选择准则的估计量。换言之,在此设定下,最佳单一估计量是不可采纳的。由于该优化问题可转化为保序回归,堆叠估计量所需计算复杂度与最佳单一估计量相同,使其在性能与实现两方面均成为颇具吸引力的替代方案。