Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.
翻译:特征装袋是一种成熟的集成方法,旨在通过组合基于特征子集或投影训练的多个估计器的预测来降低预测方差。本文在含噪最小二乘岭回归集成框架下发展了特征装袋理论,并在等相关数据的特例中简化了所得的学习曲线。利用解析学习曲线,我们证明了子采样会移动线性预测器的双下降峰值。这一发现促使我们引入异质特征集成——即基于不同特征维度的估计器组合——作为缓解双下降现象的计算高效方法。进一步,我们将特征子采样集成与单一线性预测器的性能进行比较,揭示了子采样导致的噪声放大与集成带来的噪声抑制之间的权衡。我们的定性结论可推广至基于最先进深度学习特征映射构建的真实数据集上的图像分类任务中的线性分类器。