Classification trees continue to be widely adopted in machine learning applications due to their inherently interpretable nature and scalability. We propose a rolling subtree lookahead algorithm that combines the relative scalability of the myopic approaches with the foresight of the optimal approaches in constructing trees. The limited foresight embedded in our algorithm mitigates the learning pathology observed in optimal approaches. At the heart of our algorithm lies a novel two-depth optimal binary classification tree formulation flexible to handle any loss function. We show that the feasible region of this formulation is an integral polyhedron, yielding the LP relaxation solution optimal. Through extensive computational analyses, we demonstrate that our approach outperforms optimal and myopic approaches in 808 out of 1330 problem instances, improving the out-of-sample accuracy by up to 23.6% and 14.4%, respectively.
翻译:分类树因其固有的可解释性和可扩展性,在机器学习应用中持续被广泛采用。我们提出了一种滚动子树前瞻算法,该算法结合了短视方法的相对可扩展性与最优方法在树构建中的前瞻性。算法中嵌入的有限前瞻性缓解了最优方法中观察到的学习病态现象。我们算法的核心是一种新颖的两层深度最优二分类树公式,它足够灵活,可以处理任何损失函数。我们表明,该公式的可行域是一个整数多面体,使得线性松弛解达到最优。通过大量的计算分析,我们证明,在1330个问题实例中,我们的方法在808个实例上优于最优方法和短视方法,分别将样本外准确率提高了高达23.6%和14.4%。