We study empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We establish asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc. Our analysis covers both deterministic and random designs, and accommodates correlations among the features. To the best of our knowledge, this provides the first rigorous nonparametric empirical Bayes method in a high-dimensional regression setting without sparsity.
翻译:我们研究高维线性回归中的经验贝叶斯估计。为了在计算上高效地估计潜在先验分布,我们采用了一种变分经验贝叶斯方法,该方法最初由Carbonetto和Stephens(2012)以及Kim等人(2022)提出。在对设计矩阵和先验分布的温和假设下,我们建立了非参数最大似然估计(NPMLE)及其(可计算的)朴素平均场变分代理的渐近一致性。进一步假设朴素平均场近似存在主导优化器,我们开发了Oracle后验分布的计算高效近似,并建立了其在1-Wasserstein度量下的精度。这使得计算上可行的贝叶斯推断成为可能:例如,构造具有平均覆盖保证的后验置信区间、回归系数的贝叶斯最优估计、非零比例估计等。我们的分析涵盖了确定性设计和随机设计,并允许特征之间存在相关性。据我们所知,这首次提供了无需稀疏性的高维回归设定中严谨的非参数经验贝叶斯方法。