Ensemble learning is a popular technique to improve the accuracy of machine learning models. It traditionally hinges on the rationale that aggregating multiple weak models can lead to better models with lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on ensembling. By selecting the best model trained on subsamples via majority voting, we can attain exponentially decaying tails for the excess risk, even if the base learner suffers from slow (i.e., polynomial) decay rates. This tail enhancement power of ensembling is agnostic to the underlying base learner and is stronger than variance reduction in the sense of exhibiting rate improvement. We demonstrate how our ensemble methods can substantially improve out-of-sample performances in a range of numerical examples involving heavy-tailed data or intrinsically slow rates. Code for the proposed methods is available at https://github.com/mickeyhqian/VoteEnsemble.
翻译:集成学习是一种提升机器学习模型准确性的常用技术。传统上,其核心原理在于:聚合多个弱模型可以获得方差更低、稳定性更高的更优模型,尤其对于不连续的基学习器而言。本文为集成学习提供了一个新的视角。通过对子样本训练的最佳模型进行多数投票选择,我们能够使超额风险的尾部以指数速率衰减,即使基学习器本身仅具有较慢(即多项式)的衰减速率。这种集成学习的尾部增强能力不依赖于底层基学习器,并且在体现速率改进的意义上,比方差缩减更为强大。我们通过一系列涉及重尾数据或固有慢速率的数值算例,展示了所提出的集成方法如何显著提升样本外性能。所提方法的代码可在 https://github.com/mickeyhqian/VoteEnsemble 获取。