This work explores the use of gradient boosting in the context of classification. Four popular implementations, including original GBM algorithm and selected state-of-the-art gradient boosting frameworks (i.e. XGBoost, LightGBM and CatBoost), have been thoroughly compared on several publicly available real-world datasets of sufficient diversity. In the study, special emphasis was placed on hyperparameter optimization, specifically comparing two tuning strategies, i.e. randomized search and Bayesian optimization using the Tree-stuctured Parzen Estimator. The performance of considered methods was investigated in terms of common classification accuracy metrics as well as runtime and tuning time. Additionally, obtained results have been validated using appropriate statistical testing. An attempt was made to indicate a gradient boosting variant showing the right balance between effectiveness, reliability and ease of use.
翻译:本研究探讨了梯度提升在分类任务中的应用。我们对四种流行的实现方法进行了全面比较,包括原始GBM算法及若干最先进的梯度提升框架(即XGBoost、LightGBM和CatBoost),并在多个公开、具有充分多样性的真实世界数据集上开展实验。研究中特别关注超参数优化,具体比较了两种调优策略:随机搜索与基于树结构帕森估计器的贝叶斯优化。我们从常见分类准确度指标、运行时间及调优时间三个维度评估了所考察方法的性能,并通过适当的统计检验验证了所得结果。此外,本文尝试指认一种能够在有效性、可靠性与易用性之间达成适当平衡的梯度提升变体。