We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such as Random Forests, Bagged SVMs, and Bagged Neural Networks) assign uniform voting power to all constituent estimators. However, this naive uniform prior ignores the varying local competence of base estimators and contributes to model overconfidence. We formulate ensemble pruning and calibration as a joint optimization problem over the probability simplex by minimizing the Out-Of-Bag (OOB) loss. To induce sparsity, we address the theoretical "L1-simplex paradox" -- the mathematical reality that the L1 norm is constant on the simplex and fails to prune -- by introducing a concave quadratic penalty. SCSB is model-agnostic and achieves up to 96% ensemble compression, yielding linear inference speedups and superior probability calibration (lowered Expected Calibration Error) while preserving or enhancing generalization accuracy.
翻译:我们提出Simplex约束稀疏Bagging(SCSB),一个用于基于自助法的Bagging集成后训练压缩与概率校准的数学严格框架。标准Bagging集成(如随机森林、Bagged支持向量机和Bagged神经网络)为所有组成估计器赋予均匀投票权重。然而,这种朴素的均匀先验忽略了基估计器在不同局部区域的胜任能力差异,并导致模型过度自信。我们将集成剪枝与校准建模为概率单纯形上的联合优化问题,通过最小化袋外损失来实现。为诱导稀疏性,我们引入凹二次惩罚项以解决理论上的“L1-单纯形悖论”——即在单纯形上L1范数为常数而无法实现剪枝的数学事实。SCSB具有模型无关性,可实现高达96%的集成压缩,在保持或提升泛化精度的同时,带来线性推理加速和更优的概率校准(降低期望校准误差)。