Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when nonparametric models are strongly shrunk towards parametric submodels. We consider expanding a parametric model with a nonparametric component that is heavily shrunk toward zero. This construction allows the model to adapt automatically: if the parametric model is correct, the nonparametric component disappears, recovering parametric efficiency, while if it is misspecified, the flexible component activates to capture the missing signal. We show that this adaptive behavior follows from simple and general conditions. Specifically, we prove that Bayesian nonparametric models anchored to linear regression, including variants of Gaussian processes regression and Bayesian additive regression trees, consistently identify the correct parametric submodel when it holds and give asymptotically efficient inference for regression coefficients. In simulations, we find that the "general BART" model performs identically to correctly specified linear regression when the parametric model holds, and substantially outperform it when nonlinear effects are present. This suggests a practical paradigm: "defensive model expansion" as a safeguard against model misspecification.
翻译:部分应用研究者对使用非参数方法存在顾虑,担心其在小样本中失去统计功效,或在更简单模型已足够时产生过拟合。我们认为,当非参数模型被强烈收缩至参数子模型时,至少部分此类担忧是缺乏依据的。我们考虑向参数模型扩展一个被强烈收缩至零的非参数分量。这种构造使模型能够自动适应:若参数模型正确,非参数分量将消失,从而恢复参数效率;若参数模型设定错误,则灵活分量被激活以捕捉缺失信号。我们证明这种自适应行为源于简单且普遍的条件。具体而言,我们证明了锚定于线性回归的贝叶斯非参数模型(包括高斯过程回归和贝叶斯加性回归树的变体)在参数子模型成立时能一致识别正确的参数子模型,并为回归系数提供渐近有效推断。在模拟实验中,我们发现当参数模型成立时,"广义BART"模型与正确设定的线性回归表现一致;而当存在非线性效应时,其表现显著优于线性回归。这表明了一种实用范式:将"防御性模型扩展"作为防范模型设定错误的保障措施。