Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary classification trees with univariate splits. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer optimization (MIO) technology. Yet, existing MIO-based approaches from the literature do not leverage the power of MIO to its full extent: they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose an intuitive flow-based MIO formulation for learning optimal binary classification trees. Our formulation can accommodate side constraints to enable the design of interpretable and fair decision trees. Moreover, we show that our formulation has a stronger linear optimization relaxation than existing methods in the case of binary data. We exploit the decomposable structure of our formulation and max-flow/min-cut duality to derive a Benders' decomposition method to speed-up computation. We propose a tailored procedure for solving each decomposed subproblem that provably generates facets of the feasible set of the MIO as constraints to add to the main problem. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 29 times faster than state-of-the-art MIO-based techniques and improve out-of-sample performance by up to 8%.
翻译:决策树是最流行的机器学习模型之一,广泛应用于收益管理、医学和生物信息学等领域。本文研究学习具有单变量分裂的最优二元分类树问题。近年来,受启发式方法经验次优性以及混合整数优化(MIO)技术的显著进步,相关文献大量涌现。然而,现有基于MIO的方法并未充分利用MIO的全部潜力:它们依赖弱松弛形式,导致收敛缓慢且最优间隙较大。为填补这一空白,我们提出一种直观的基于流的MIO松弛形式,用于学习最优二元分类树。该形式可纳入边约束,从而设计可解释且公平的决策树。此外,我们证明在二元数据情况下,该松弛形式比现有方法具有更强的线性优化松弛性。我们利用松弛形式的可分解结构及最大流/最小割对偶性,推导出Benders分解方法以加速计算。针对每个分解子问题,我们提出定制化求解程序,可证明生成MIO可行集的保序面作为约束加入主问题。我们在标准基准数据集上进行广泛计算实验,结果表明,我们的方法比最先进的基于MIO的技术快29倍,且样本外性能提升高达8%。