Covariate measurement error in nonparametric regression is a common problem in nutritional epidemiology and geostatistics, and other fields. Over the last two decades, this problem has received substantial attention in the frequentist literature. Bayesian approaches for handling measurement error have only been explored recently and are surprisingly successful, although the lack of a proper theoretical justification regarding the asymptotic performance of the estimators. By specifying a Gaussian process prior on the regression function and a Dirichlet process Gaussian mixture prior on the unknown distribution of the unobserved covariates, we show that the posterior distribution of the regression function and the unknown covariates density attain optimal rates of contraction adaptively over a range of H\"{o}lder classes, up to logarithmic terms. This improves upon the existing classical frequentist results which require knowledge of the smoothness of the underlying function to deliver optimal risk bounds. We also develop a novel surrogate prior for approximating the Gaussian process prior that leads to efficient computation and preserves the covariance structure, thereby facilitating easy prior elicitation. We demonstrate the empirical performance of our approach and compare it with competitors in a wide range of simulation experiments and a real data example.
翻译:非参数回归中的协变量测量误差是营养流行病学、地质统计学及其他领域的常见问题。过去二十年间,该问题在频率学派文献中受到广泛关注。尽管贝叶斯方法处理测量误差的研究近年才开展并取得惊人成功,但相关估计量的渐近性能缺乏严格的理论依据。本文通过设定回归函数的高斯过程先验与未观测协变量未知分布的狄利克雷过程高斯混合先验,证明回归函数与未知协变量密度的后验分布能在赫尔德类函数范围内自适应地达到最优收缩速率(含对数项)。这一结果改进了现有经典频率学派方法中需预先知晓底层函数光滑性才能获得最优风险界的要求。我们还开发了一种新型替代先验以近似高斯过程先验,该方法既能保持协方差结构实现高效计算,又便于先验的便捷设定。通过广泛模拟实验和真实数据案例,我们展示了所提方法的实证表现,并与竞争方法进行了比较。