Many asymptotically minimax procedures for function estimation often rely on somewhat arbitrary and restrictive assumptions such as isotropy or spatial homogeneity. This work enhances the theoretical understanding of Bayesian additive regression trees under substantially relaxed smoothness assumptions. We provide a comprehensive study of asymptotic optimality and posterior contraction of Bayesian forests when the regression function has anisotropic smoothness that possibly varies over the function domain. The regression function can also be possibly discontinuous. We introduce a new class of sparse {\em piecewise heterogeneous anisotropic} H\"{o}lder functions and derive their minimax lower bound of estimation in high-dimensional scenarios under the $L_2$-loss. We then find that the Bayesian tree priors, coupled with a Dirichlet subset selection prior for sparse estimation in high-dimensional scenarios, adapt to unknown heterogeneous smoothness, discontinuity, and sparsity. These results show that Bayesian forests are uniquely suited for more general estimation problems that would render other default machine learning tools, such as Gaussian processes, suboptimal. Our numerical study shows that Bayesian forests often outperform other competitors such as random forests and deep neural networks, which are believed to work well for discontinuous or complicated smooth functions. Beyond nonparametric regression, we also examined posterior contraction of Bayesian forests for density estimation and binary classification using the technique developed in this study.
翻译:许多渐近极小极大函数估计方法往往依赖于各向同性或空间齐次性等略显任意且限制性的假设。本研究在显著放宽光滑性假设的条件下,深化了对贝叶斯加性回归树的理论认知。当回归函数具有可能随函数域变化的各向异性光滑性时,我们对其渐近最优性和贝叶斯森林的后验收缩进行了全面研究。该回归函数也可能存在不连续性。我们引入了一类新的稀疏分段异质各向异性Hölder函数,并推导了在高维场景下基于$L_2$损失的估计极小极大下界。进一步发现,结合用于高维稀疏估计的Dirichlet子集选择先验,贝叶斯树先验能够自适应地适应未知的异质光滑性、不连续性和稀疏性。这些结果表明,贝叶斯森林特别适用于更一般的估计问题,而此类问题会使高斯过程等其他默认机器学习工具表现次优。我们的数值研究表明,贝叶斯森林往往优于随机森林和深度神经网络等公认对不连续或复杂光滑函数有效的竞争方法。除非参数回归外,我们还利用本研究开发的技术检验了贝叶斯森林在密度估计和二分类中的后验收缩性能。