Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.
翻译:基于模型的成分梯度提升是一种流行的数据驱动变量选择工具。为了进一步提升其预测和选择性能,研究者们开发了原始算法的多种改进版本,这些改进主要聚焦于不同的停止准则,而变量选择机制本身未受影响。我们研究了基于模型的成分梯度提升中变量选择步骤的不同预测驱动机制,这些方法包括基于赤池信息准则(AIC)以及通过交叉验证计算的成分检验误差的选择规则。我们针对广义线性模型实现了AIC和交叉验证流程,并评估了其在变量选择性质和预测性能方面的表现。大规模模拟研究表明,该方法提升了选择性能,而在使用年龄标准化COVID-19发病率的实际应用中,预测误差得以降低。