Bayesian Additive Regression Trees (BART) are a powerful semiparametric ensemble learning technique for modeling nonlinear regression functions. Although initially BART was proposed for predicting only continuous and binary response variables, over the years multiple extensions have emerged that are suitable for estimating a wider class of response variables (e.g. categorical and count data) in a multitude of application areas. In this paper we describe a Generalized framework for Bayesian trees and their additive ensembles where the response variable comes from an exponential family distribution and hence encompasses a majority of these variants of BART. We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor. In this regard our results provide theoretical justification for the empirical success of BART and its variants.
翻译:贝叶斯加性回归树(BART)是一种用于建模非线性回归函数的强大半参数集成学习技术。尽管BART最初仅针对连续型和二元响应变量进行预测,但多年来已涌现出多种扩展方法,使其适用于估计更广泛类型的响应变量(如分类数据和计数数据),并在众多应用领域得到应用。本文描述了一个针对贝叶斯树及其加性集成的广义框架,其中响应变量服从指数族分布,从而涵盖了BART的大多数变体。我们推导了关于响应分布的充分条件,在此条件下后验分布以最小化最大速率(达到对数因子)实现集中性。从这个意义上说,我们的结果为BART及其变体的经验成功提供了理论依据。