We study empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We establish asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc. Our analysis covers both deterministic and random designs, and accommodates correlations among the features. To the best of our knowledge, this provides the first rigorous nonparametric empirical Bayes method in a high-dimensional regression setting without sparsity.
翻译:我们研究了高维线性回归中的经验贝叶斯估计问题。为高效估计潜在先验分布,我们采用了最初由Carbonetto和Stephens(2012)以及Kim等人(2022)提出的变分经验贝叶斯方法。在关于设计和先验分布的温和假设下,我们建立了非参数最大似然估计量(NPMLE)及其(可计算的)朴素平均场变分替代量的渐近一致性。进一步假设朴素平均场逼近存在主导优化解,我们开发了面向Oracle后验分布的计算高效逼近方法,并证明了其在1-瓦瑟斯坦度量下的精度。这使得计算可行的贝叶斯推断成为可能,例如:具有平均覆盖保证的后验可信区间构建、回归系数的贝叶斯最优估计、非零元素比例的估计等。我们的分析涵盖了确定性设计和随机设计,并允许特征间存在相关性。据我们所知,这是首个在无稀疏性假设的高维回归框架下实现严格非参数经验贝叶斯方法的研究。