Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.
翻译:集成学习是机器学习中最常用的工具之一,因其在最小化方差从而提升泛化能力方面表现出色。大多数针对黑箱基学习器的集成方法都属于"堆叠泛化"的范畴,即训练一个机器学习算法,其输入来自基学习器的推理结果。尽管堆叠方法在实践中得到广泛应用,但其理论性质尚不明确。本文证明了一个新颖的结果,表明从(有限或有限维)堆叠泛化家族中基于交叉验证性能选择最佳堆叠泛化,其表现不会"显著差于"最优选择。该结果强化并显著扩展了Van der Laan等人(2007年)的研究成果。受理论分析启发,我们进一步在概率预测场景下提出了一类特定的堆叠泛化家族,每个家族成员对集成权重在项目、预测时间戳和分位数上的变化敏感度不同。实验结果表明了所提方法的性能提升。