Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.
翻译:特征袋装是一种成熟的集成方法,通过基于特征的随机子采样或投影训练集成估计器来降低预测方差。通常,集成采用同质化设计,即估计器可用的特征维度数量在整个集成中保持一致。本文引入异质特征集成方法,通过构建基于不同数量特征维度的估计器,并在线性回归场景中分析其性能。我们研究一组线性预测器的集成,每个预测器通过岭回归在可用特征的子集上进行训练,并允许这些子集中包含的特征数量动态变化。利用统计物理学中的复制技巧,我们推导出采用确定性线性掩码的岭集成学习曲线。在等相关数据且各向同性特征噪声条件下,我们获得了学习曲线的显式表达式。基于推导的表达式,我们进一步研究子采样与集成效应,发现在噪声水平、数据相关性和数据-任务对齐的参数空间中,最优集成策略存在尖锐的相变现象。最后,我们提出变维度特征袋装方法,作为缓解双下降现象、实现鲁棒机器学习的实用策略。