In this paper we develop the randomized Sharded Bayesian Additive Regression Trees (SBT) model. We introduce a randomization auxiliary variable and a sharding tree to decide partitioning of data, and fit each partition component to a sub-model using Bayesian Additive Regression Tree (BART). By observing that the optimal design of a sharding tree can determine optimal sharding for sub-models on a product space, we introduce an intersection tree structure to completely specify both the sharding and modeling using only tree structures. In addition to experiments, we also derive the theoretical optimal weights for minimizing posterior contractions and prove the worst-case complexity of SBT.
翻译:本文提出了随机化分片贝叶斯加性回归树(SBT)模型。我们引入一个随机构辅助变量和一个分片树来决定数据的分割方式,并对每个分割组件使用贝叶斯加性回归树(BART)拟合子模型。通过观察到分片树的最优设计可以在乘积空间上确定子模型的最优分片方式,我们引入了一种交树结构,仅利用树结构即可完整指定分片与建模过程。除实验验证外,我们还推导了最小化后验收缩的理论最优权重,并证明了SBT的最坏情况复杂度。