Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalized linear models. Many improvements and sophistications to the first gradient boosting machine algorithm exist. We present in a unified notation, and contrast, all the existing point and probabilistic gradient boosting for decision tree algorithms: GBM, XGBoost, DART, LightGBM, CatBoost, EGBM, PGBM, XGBoostLSS, cyclic GBM, and NGBoost. In this comprehensive numerical study, we compare their performance on five publicly available datasets for claim frequency and severity, of various size and comprising different number of (high cardinality) categorical variables. We explain how varying exposure-to-risk can be handled with boosting in frequency models. We compare the algorithms on the basis of computational efficiency, predictive performance, and model adequacy. LightGBM and XGBoostLSS win in terms of computational efficiency. The fully interpretable EGBM achieves competitive predictive performance compared to the black box algorithms considered. We find that there is no trade-off between model adequacy and predictive accuracy: both are achievable simultaneously.
翻译:决策树梯度提升算法在精算应用中日益普及,因其展现出优于传统广义线性模型的预测性能。自首个梯度提升机算法问世以来,已出现诸多改进与优化版本。本文以统一符号体系系统阐述并对比所有现有的点估计与概率梯度提升决策树算法:GBM、XGBoost、DART、LightGBM、CatBoost、EGBM、PGBM、XGBoostLSS、循环GBM及NGBoost。在此综合性数值研究中,我们基于五个公开可得的索赔频率与严重性数据集(涵盖不同规模及包含不同数量高基数分类变量)比较了这些算法的性能。我们阐释了在频率模型中如何通过提升算法处理可变风险暴露量。从计算效率、预测性能及模型充分性三个维度对算法进行综合评估。LightGBM与XGBoostLSS在计算效率方面表现最优。相较于所考察的黑箱算法,完全可解释的EGBM展现出具有竞争力的预测性能。研究发现模型充分性与预测准确性之间不存在权衡关系:二者可同时实现。