Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality. The code is available at https://github.com/xiyanghu/OSDT.
翻译:自20世纪80年代初以来,决策树算法一直是可解释(透明)机器学习中最流行的算法之一。然而,自决策树算法问世以来,始终困扰其发展的核心问题是缺乏最优性,即无法保证接近最优:决策树算法通常采用贪婪或短视策略,有时会产生明显非最优的模型。决策树优化的困难既是理论障碍也是实践障碍,即使采用精细的数学规划方法也无法高效解决这些问题。本文首次提出针对二元变量的最优决策树的实用算法。该算法协同设计了缩减搜索空间的分析边界与现代系统技术,包括数据结构及自定义位向量库。我们的实验凸显了其在可扩展性、运行速度和最优性证明方面的优势。代码地址为:https://github.com/xiyanghu/OSDT。