In the field of decision trees, most previous studies have difficulty ensuring the statistical optimality of a prediction of new data and suffer from overfitting because trees are usually used only to represent prediction functions to be constructed from given data. In contrast, some studies, including this paper, used the trees to represent stochastic data observation processes behind given data. Moreover, they derived the statistically optimal prediction, which is robust against overfitting, based on the Bayesian decision theory by assuming a prior distribution for the trees. However, these studies still have a problem in computing this Bayes optimal prediction because it involves an infeasible summation for all division patterns of a feature space, which is represented by the trees and some parameters. In particular, an open problem is a summation with respect to combinations of division axes, i.e., the assignment of features to inner nodes of the tree. We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.
翻译:在决策树领域,以往的大多数研究难以保证对新数据预测的统计最优性,且容易陷入过拟合问题,因为决策树通常仅被用作从给定数据中构建预测函数的表示工具。相比之下,包括本文在内的一些研究将决策树视为给定数据背后的随机数据观测过程。这些研究基于贝叶斯决策理论,通过为决策树假设先验分布,推导出具有统计最优性且对过拟合鲁棒的预测方法。然而,这些方法在计算贝叶斯最优预测时仍存在问题,因为这需要对特征空间的所有划分模式(由决策树及若干参数表示)进行不可实现的和式求和。特别地,一个尚未解决的难题是关于划分轴组合的求和,即如何将特征分配给树的内部节点。我们通过马尔可夫链蒙特卡洛方法解决了该问题,其步长可根据决策树的后验分布自适应调整。