Out-Of-Distribution (OOD) generalization is an essential topic in machine learning. However, recent research is only focusing on the corresponding methods for neural networks. This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT). IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is constructed. Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets. The superior performance compared to non-OOD tree models implies that considering OOD generalization for tree models is absolutely necessary and should be given more attention.
翻译:分布外泛化是机器学习中的一个重要课题。然而,近期研究仅关注神经网络的相应方法。本文提出了一种新颖且有效的决策树模型分布外泛化解决方案,命名为不变决策树(IDT)。IDT在树生长过程中,针对分裂在不同环境下的不稳定/变化行为施加惩罚项。其集成版本——不变随机森林(IRF)被构建而成。我们的方法在一个温和条件下的理论结果驱动下,并通过合成数据和真实数据集的数值测试得到验证。与非分布外树模型相比的优越性能表明,考虑树模型的分布外泛化绝对必要,且应受到更多关注。