A decision tree is one of the most popular approaches in machine learning fields. However, it suffers from the problem of overfitting caused by overly deepened trees. Then, a meta-tree is recently proposed. It solves the problem of overfitting caused by overly deepened trees. Moreover, the meta-tree guarantees statistical optimality based on Bayes decision theory. Therefore, the meta-tree is expected to perform better than the decision tree. In contrast to a single decision tree, it is known that ensembles of decision trees, which are typically constructed boosting algorithms, are more effective in improving predictive performance. Thus, it is expected that ensembles of meta-trees are more effective in improving predictive performance than a single meta-tree, and there are no previous studies that construct multiple meta-trees in boosting. Therefore, in this study, we propose a method to construct multiple meta-trees using a boosting approach. Through experiments with synthetic and benchmark datasets, we conduct a performance comparison between the proposed methods and the conventional methods using ensembles of decision trees. Furthermore, while ensembles of decision trees can cause overfitting as well as a single decision tree, experiments confirmed that ensembles of meta-trees can prevent overfitting due to the tree depth.
翻译:决策树是机器学习领域最常用的方法之一,但其因树深度过大而易导致过拟合问题。近期提出的元树方法可解决该问题,且基于贝叶斯决策理论保证了统计最优性,预期其性能优于传统决策树。与单一决策树不同,通过提升算法构建的决策树集成能更有效提升预测性能。由此推论,元树集成应比单一元树具有更优的预测性能,而现有研究中尚未有通过提升方法构建多元树集合的尝试。本研究提出一种基于提升策略的多元树构建方法。通过合成数据集与基准数据集的实验,将所提方法与基于决策树集成的传统方法进行性能比较。实验证实:虽然决策树集成与单一决策树均可能引发过拟合,但元树集成能有效防止因树深度导致的过拟合现象。