High-dimensional linear models have been extensively studied in the recent literature, but the developments in high-dimensional generalized linear models, or GLMs, have been much slower. In this paper, we propose the use an empirical or data-driven prior specification leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. For our proposed method, we prove that the posterior distribution concentrates around the true/sparse coefficient vector at the optimal rate and, furthermore, provide conditions under which the posterior can achieve variable selection consistency. Computation of the proposed empirical Bayes posterior is simple and efficient, and, in terms of variable selection in logistic and Poisson regression, is shown to perform well in simulations compared to existing Bayesian and non-Bayesian methods.
翻译:高维线性模型近年已得到广泛研究,但高维广义线性模型(GLM)的发展则相对缓慢。本文提出一种基于经验或数据驱动先验设定的方法,由此生成可用于高维GLM中系数向量估计、推断及变量选择的经验贝叶斯后验分布。对于所提方法,我们证明后验分布以最优速率集中于真实/稀疏系数向量,并进一步给出后验实现变量选择一致性的条件。该经验贝叶斯后验的计算简便高效,且在逻辑回归与泊松回归的变量选择模拟中,其表现优于现有贝叶斯及非贝叶斯方法。