Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners at the parametrization instead of the output level, bagging improves generalization performances exponentially, a strength that is significantly more powerful than variance reduction. More precisely, we show that for general stochastic optimization problems that suffer from slowly (i.e., polynomially) decaying generalization errors, bagging can effectively reduce these errors to an exponential decay. Moreover, this power of bagging is agnostic to the solution schemes, including common empirical risk minimization, distributionally robust optimization, and various regularizations. We demonstrate how bagging can substantially improve generalization performances in a range of examples involving heavy-tailed data that suffer from intrinsically slow rates.
翻译:装袋法是一种常用的集成技术,旨在提升机器学习模型的准确率。其核心原理已得到广泛认可:通过对重采样数据反复训练,聚合模型展现出更低的方差,从而获得更高的稳定性,尤其适用于不连续的基础学习器。本文为装袋法提供了一个新视角:通过在参数化层面而非输出层面对基础学习器进行适当聚合,装袋法能够以指数级提升泛化性能——这一优势远强于方差缩减效应。具体而言,我们证明对于泛化误差衰减缓慢(即多项式衰减)的一般随机优化问题,装袋法能将其有效降低至指数衰减。此外,装袋法的这种能力与求解方法无关,适用于常见的经验风险最小化、分布鲁棒优化及各类正则化方法。我们通过一系列涉及重尾数据的实例,展示了装袋法如何显著改善因内在慢收敛速率而受限的泛化性能。