In this paper, we introduce Optimal Classification Forests, a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed, each of them providing a predicted class for the observations in the feature space. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest. We provide a mixed integer linear programming formulation for the problem. We report the results of our computational experiments, from which we conclude that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods. More importantly, it achieves high prediction accuracy with, for example, orders of magnitude fewer trees than random forests. We also present three real-world case studies showing that our methodology has very interesting implications in terms of interpretability.
翻译:本文提出最优分类森林(Optimal Classification Forests),这是一种通过最优决策树集成方法生成准确且可解释分类器的新型分类器族。我们提出了一种基于数学优化的创新方法,该方法同步构建给定数量的决策树,每棵决策树均为特征空间中的观测值提供预测类别。分类规则通过为每个观测值分配其在森林中各决策树中最常被预测的类别来确定。我们为该问题建立了混合整数线性规划模型。通过计算实验的结果表明,我们提出的方法在性能上等同于或优于当前最先进的基于树的分类方法。更重要的是,该方法能以显著少于随机森林的决策树数量(例如,数量级差异)实现高预测精度。我们还通过三个实际案例研究证明,该方法在可解释性方面具有非常有趣的应用价值。