Better Trees: An empirical study on hyperparameter tuning of classification decision tree induction algorithms

Rafael Gomes Mantovani,Tomáš Horváth,André L. D. Rossi,Ricardo Cerri,Sylvio Barbon Junior,Joaquin Vanschoren,André Carlos Ponce de Leon Ferreira de Carvalho

from arxiv, 60 pages, 16 figures

Machine learning algorithms often contain many hyperparameters (HPs) whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these HP configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently exploring this vast space of configurations and dealing with the trade-off between predictive and runtime performance remain challenging. Furthermore, there are cases where the default HPs fit the suitable configuration. Additionally, for many reasons, including model validation and attendance to new legislation, there is an increasing interest in interpretable models, such as those created by the Decision Tree (DT) induction algorithms. This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning for the two DT induction algorithms most often used, CART and C4.5. DT induction algorithms present high predictive performance and interpretable classification models, though many HPs need to be adjusted. Experiments were carried out with different tuning strategies to induce models and to evaluate HPs' relevance using 94 classification datasets from OpenML. The experimental results point out that different HP profiles for the tuning of each algorithm provide statistically significant improvements in most of the datasets for CART, but only in one-third for C4.5. Although different algorithms may present different tuning scenarios, the tuning techniques generally required few evaluations to find accurate solutions. Furthermore, the best technique for all the algorithms was the IRACE. Finally, we found out that tuning a specific small subset of HPs is a good alternative for achieving optimal predictive performance.

翻译：机器学习算法通常包含众多超参数，这些参数以复杂方式影响所生成模型的预测性能。由于超参数配置存在大量可能性且其交互关系复杂，通常采用优化技术来寻找能够实现高预测性能的参数设置。然而，如何高效探索这一庞大配置空间并兼顾预测性能与运行时性能的权衡仍具挑战。此外，在某些情况下默认超参数配置已经足够。同时，出于模型验证及遵守新法规等多重原因，可解释模型（如决策树归纳算法生成的模型）日益受到关注。本文对最常用的两种决策树归纳算法（CART和C4.5）的超参数调优效果进行了系统研究。尽管决策树归纳算法需要调整众多超参数，但其能产出预测性能优异且可解释的分类模型。我们采用不同调优策略进行了实验，利用OpenML平台上的94个分类数据集评估超参数相关性。实验结果表明：针对不同算法的超参数调优配置文件在大多数数据集上为CART带来统计显著的性能提升，但对C4.5仅有三分之一数据集获得提升。虽然不同算法可能呈现不同的调优场景，但调优技术通常仅需少量评估即可找到准确解。在所有算法中表现最佳的调优技术为IRACE。最后，我们发现对特定小规模超参数子集进行调优是实现最优预测性能的有效途径。