Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm, even with pruning. Instead, the convergence may be poly-logarithmic or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.
翻译:决策树学习正越来越多地被用于点态推断。重要应用包括异质性因果处理效应、动态策略决策、条件分位数回归以及实验设计,在这些场景中,树的估计与推断是在协变量的特定取值下进行的。本文质疑了使用(通过自适应递归划分训练的)决策树实现此类目标的可行性,通过证明即使经过剪枝,这些树也可能无法在一致范数下达到多项式收敛速度。相反,收敛速度可能是多对数形式的,或在某些重要特例(如诚实回归树)中完全失效。我们证明了随机森林能够改善这一困境,将表现不佳的树转化为近似最优的程序,但代价是丧失可解释性并引入两个额外的调优参数。随机森林的两大特征——子采样和随机特征选择机制——各自被证实对实现所考虑模型类别的近似最优性能具有独特贡献。