High-dimensional linear models have been widely studied, but the developments in high-dimensional generalized linear models, or GLMs, have been slower. In this paper, we propose an empirical or data-driven prior leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. We prove that our proposed posterior concentrates around the true/sparse coefficient vector at the optimal rate, provide conditions under which the posterior can achieve variable selection consistency, and prove a Bernstein--von Mises theorem that implies asymptotically valid uncertainty quantification. Computation of the proposed empirical Bayes posterior is simple and efficient, and is shown to perform well in simulations compared to existing Bayesian and non-Bayesian methods in terms of estimation and variable selection.
翻译:高维线性模型已得到广泛研究,但高维广义线性模型(GLM)的发展相对缓慢。本文提出一种基于数据驱动先验的经验贝叶斯后验分布,该分布可用于高维GLM中系数向量的估计与推断以及变量选择。我们证明:所提出的后验分布能以最优速率收敛于真实/稀疏系数向量;给出了实现变量选择一致性的条件;并证明了伯恩斯坦-冯·米塞斯定理,表明其可实现渐近有效的量化不确定性。该经验贝叶斯后验的计算简单高效,模拟实验表明,在估计和变量选择方面,其性能优于现有贝叶斯与非贝叶斯方法。