Boosting方法再审视：基于线性规划的集成方法基准测试与前沿进展 (Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods)

Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.

翻译：尽管基于线性规划的完全校正Boosting方法在理论上具有吸引力，但其在实证研究中受到的关注有限。本文首次对六种基于线性规划的Boosting模型（包括两种新方法NM-Boost和QRLP-Boost）在20个多样化数据集上进行了大规模实验研究。我们评估了这些模型中启发式与最优基学习器的使用效果，不仅分析了分类精度，还考察了集成稀疏性、间隔分布、实时性能以及超参数敏感性。研究表明，在使用浅层决策树时，完全校正方法能够超越或匹配XGBoost和LightGBM等先进启发式算法，同时生成显著更稀疏的集成模型。我们进一步证明这些方法可以在不牺牲性能的前提下对预训练集成模型进行稀疏化处理，并重点探讨了在该场景下使用最优决策树的优势与局限性。