We show how to construct the implied copula process of response values from a Bayesian additive regression tree (BART) model with prior on the leaf node variances. This copula process, defined on the covariate space, can be paired with any marginal distribution for the dependent variable to construct a flexible distributional BART model. Bayesian inference is performed via Markov chain Monte Carlo on an augmented posterior, where we show that key sampling steps can be realized as those of Chipman et al. (2010), preserving scalability and computational efficiency even though the copula process is high dimensional. The posterior predictive distribution from the copula process model is derived in closed form as the push-forward of the posterior predictive distribution of the underlying BART model with an optimal transport map. Under suitable conditions, we establish posterior consistency for the regression function and posterior means and prove convergence in distribution of the predictive process and conditional expectation. Simulation studies demonstrate improved accuracy of distributional predictions compared to the original BART model and leading benchmarks. Applications to five real datasets with 506 to 515,345 observations and 8 to 90 covariates further highlight the efficacy and scalability of our proposed BART copula process model.
翻译:本文展示了如何从具有叶节点方差先验的贝叶斯可加回归树(BART)模型推导出响应值的隐含Copula过程。该定义在协变量空间上的Copula过程可与因变量的任意边缘分布结合,构建灵活的分布型BART模型。我们通过马尔可夫链蒙特卡洛方法对增广后验分布进行贝叶斯推断,证明关键采样步骤可采用Chipman等人(2010)的算法实现,即使Copula过程是高维的,仍能保持可扩展性与计算效率。Copula过程模型的后验预测分布以闭式形式导出,表现为基础BART模型后验预测分布通过最优传输映射的推前分布。在适当条件下,我们建立了回归函数的后验一致性及后验均值的收敛性,并证明了预测过程与条件期望的分布收敛。仿真研究表明,相较于原始BART模型及主流基准方法,本文方法显著提升了分布预测的准确性。在包含506至515,345个观测值、8至90个协变量的五个真实数据集上的应用,进一步验证了所提BART Copula过程模型的有效性与可扩展性。