Decision Tree (DT) Learning is a fundamental problem in Interpretable Machine Learning, yet it poses a formidable optimisation challenge. Despite numerous efforts dating back to the early 1990's, practical algorithms have only recently emerged, primarily leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques. These methods fall into two categories: algorithms like DL8.5, MurTree and STreeD utilise an efficient DP strategy but lack effective bounds for pruning the search space; while algorithms like OSDT and GOSDT employ more efficient pruning bounds but at the expense of a less refined DP strategy. We introduce Branches, a new algorithm that combines the strengths of both approaches. Using DP and B&B with a novel analytical bound for efficient pruning, Branches offers both speed and sparsity optimisation. Unlike other methods, it also handles non-binary features. Theoretical analysis shows its lower complexity compared to existing methods, and empirical results confirm that Branches outperforms the state-of-the-art in speed, iterations, and optimality.
翻译:决策树学习是可解释机器学习中的一个基础问题,但其优化过程极具挑战性。尽管自20世纪90年代初以来已有大量研究,实用的算法直到近期才出现,主要依赖于动态规划和分支定界技术。现有方法可分为两类:DL8.5、MurTree和STreeD等算法采用高效的动态规划策略,但缺乏有效的剪枝边界;而OSDT和GOSDT等算法虽使用更高效的剪枝边界,却以牺牲动态规划策略的精细度为代价。本文提出分支算法,该算法融合了两类方法的优势。通过结合动态规划与分支定界,并引入一种新颖的解析边界以实现高效剪枝,分支算法在速度与稀疏性优化方面均表现出色。与其他方法不同,它还能处理非二值特征。理论分析表明其复杂度低于现有方法,实证结果也证实分支算法在速度、迭代次数和最优性方面均优于当前最优方法。