Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussian processes to the leaf nodes in BART for predicting points outside the range of the observed data. The new method is compared to standard BART implementations and recent frequentist resampling-based methods for predictive inference. We apply the new approach to a challenging problem from causal inference, wherein for some regions of predictor space, only treated or untreated units are observed (but not both). In simulation studies, the new approach boasts superior performance compared to popular alternatives, such as Jackknife+.
翻译:贝叶斯加性回归树(BART)是一种半参数回归模型,在样本外预测方面展现出最先进的性能。尽管取得这一成功,BART的标准实现在预测数据范围之外的观测点时,通常会产生不准确的预测和过于狭窄的预测区间。本文提出一种新颖的外推策略,将高斯过程嫁接至BART的叶节点,用于预测超出观测数据范围的观测值。将新方法与标准BART实现以及近期基于频率学派重采样的预测推断方法进行比较。我们将这一新方法应用于因果推断中的挑战性问题——在预测变量的某些区域中,仅能观测到处理组或对照组的单元(而无法同时观测两者)。模拟研究表明,相较于Jackknife+等主流替代方法,新方法展现出更优越的性能。