最优决策树问题的基础理论. I. 算法与几何基础 (Foundational theory for optimal decision tree problems. I. Algorithmic and geometric foundations)

In the first paper (part I) of this series of two, we introduce four novel definitions of the ODT problems: three for size-constrained trees and one for depth-constrained trees. These definitions are stated unambiguously through executable recursive programs, satisfying all criteria we propose for a formal specification. In this sense, they resemble the "standard form" used in the study of general-purpose solvers. Grounded in algebraic programming theory-a relational formalism for deriving correct-by-construction algorithms from specifications-we can not only establish the existence or nonexistence of dynamic programming solutions but also derive them constructively whenever they exist. Consequently, the four generic problem definitions yield four novel optimal algorithms for ODT problems with arbitrary splitting rules that satisfy the axioms and objective functions of a given form. These algorithms encompass the known depth-constrained, axis-parallel ODT algorithm as the special case, while providing a unified, efficient, and elegant solution for the general ODT problem. In Part II, we present the first optimal hypersurface decision tree algorithm and provide comprehensive experiments against axis-parallel decision tree algorithms, including heuristic CART and state-of-the-art optimal methods. The results demonstrate the significant potential of decision trees with flexible splitting rules. Moreover, our framework is readily extendable to support algorithms for constructing even more flexible decision trees, including those with mixed splitting rules.

翻译：在本系列两篇论文的第一篇（第一部分）中，我们提出了ODT问题的四种新颖定义：三种针对尺寸约束树，一种针对深度约束树。这些定义通过可执行的递归程序进行明确表述，满足我们为形式化规范提出的所有标准。从这个意义上说，它们类似于通用求解器研究中使用的“标准形式”。基于代数编程理论——一种从规范推导构造正确算法的关系形式化方法——我们不仅能够确定动态规划解的存在与否，还能在解存在时以构造性方式推导出它们。因此，四种通用问题定义针对满足给定形式公理和目标函数的任意分割规则的ODT问题，产生了四种新颖的最优算法。这些算法将已知的深度约束、轴平行ODT算法作为特例包含在内，同时为一般ODT问题提供了统一、高效且优雅的解决方案。在第二部分中，我们提出了首个最优超曲面决策树算法，并与轴平行决策树算法（包括启发式CART和最先进的最优方法）进行了全面实验。结果表明，具有灵活分割规则的决策树具有显著潜力。此外，我们的框架易于扩展，可支持构建更灵活决策树（包括具有混合分割规则的决策树）的算法。