Many asymptotically minimax procedures for function estimation often rely on somewhat arbitrary and restrictive assumptions such as isotropy or spatial homogeneity. This work enhances the theoretical understanding of Bayesian additive regression trees under substantially relaxed smoothness assumptions. We provide a comprehensive study of asymptotic optimality and posterior contraction of Bayesian forests when the regression function has anisotropic smoothness that possibly varies over the function domain. The regression function can also be possibly discontinuous. We introduce a new class of sparse {\em piecewise heterogeneous anisotropic} H\"{o}lder functions and derive their minimax lower bound of estimation in high-dimensional scenarios under the $L_2$-loss. We then find that the Bayesian tree priors, coupled with a Dirichlet subset selection prior for sparse estimation in high-dimensional scenarios, adapt to unknown heterogeneous smoothness, discontinuity, and sparsity. These results show that Bayesian forests are uniquely suited for more general estimation problems that would render other default machine learning tools, such as Gaussian processes, suboptimal. Our numerical study shows that Bayesian forests often outperform other competitors such as random forests and deep neural networks, which are believed to work well for discontinuous or complicated smooth functions. Beyond nonparametric regression, we also examined posterior contraction of Bayesian forests for density estimation and binary classification using the technique developed in this study.
翻译:许多渐近极小极大函数估计方法往往依赖于各向同性或空间同质性等较为随意且具有限制性的假设。本研究在显著放宽的光滑性假设下,深化了对贝叶斯加性回归树的理论理解。我们系统研究了当回归函数具有可能随函数域变化的各向异性光滑性时,贝叶斯森林的渐近最优性与后验收缩性质,且回归函数可能包含不连续性。我们引入一类新的稀疏分段异质各向异性Hölder函数,推导了高维场景下$L_2$损失估计的极小极大下界。研究发现,结合高维稀疏估计的狄利克雷子集选择先验的贝叶斯树先验,能够自适应未知的异质光滑性、不连续性和稀疏性。这些结果表明,贝叶斯森林特别适用于更一般的估计问题,而传统机器学习工具(如高斯过程)在此类问题中表现欠优。数值实验表明,贝叶斯森林通常优于随机森林和深度神经网络等被公认为适用于不连续或复杂光滑函数的竞争对手。除非参数回归外,我们还利用本研究发展的技术,考察了贝叶斯森林在密度估计和二分类任务中的后验收缩性质。