Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the constituent estimators, and least-squares with nonnegativity constraints to learn the combination weights. In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint. When the constituent estimators are linear least-squares projections onto nested subspaces separated by at least three dimensions, we show that thanks to a shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them. Here ``best'' refers to a model that minimizes a selection criterion such as AIC or BIC. In other words, in this setting, the best single estimator is inadmissible. Because the optimization problem can be reformulated as isotonic regression, the stacked estimator requires the same order of computation as the best single estimator, making it an attractive alternative in terms of both performance and implementation.
翻译:堆叠回归是一种集成技术,它通过组合不同回归估计量的线性组合来提升预测精度。传统方法使用交叉验证数据从各组成估计量生成预测,并通过非负约束的最小二乘法学习组合权重。在本文中,我们通过在非负约束下最小化总体风险的估计值来类似地学习这些权重。当组成估计量是投影到至少相隔三个维度的嵌套子空间上的线性最小二乘投影时,我们证明,由于收缩效应,所得堆叠估计量的总体风险严格小于其中最佳单个估计量的总体风险。这里的“最佳”指的是最小化AIC或BIC等选择准则的模型。换句话说,在这一设定下,最佳单个估计量是不可采纳的。由于该优化问题可转化为保序回归,堆叠估计量所需的计算量与最佳单个估计量相当,使其在性能和实现方面都成为有吸引力的替代方案。