Bayesian inference offers a powerful framework for variable selection by incorporating sparsity through prior beliefs and quantifying uncertainty about parameters, leading to consistent procedures with good finite-sample performance. However, accurately quantifying uncertainty requires a correctly specified model, and there is increasing awareness of the problems that model misspecification causes for variable selection. Current solutions to this problem either require a more complex model, detracting from the interpretability of the original variable selection task, or gain robustness by moving outside of rigorous Bayesian uncertainty quantification. This paper establishes the model quasi-posterior as a principled tool for variable selection. We prove that the model quasi-posterior shares many of the desirable properties of full Bayesian variable selection, but no longer necessitates a full likelihood specification. Instead, the quasi-posterior only requires the specification of mean and variance functions, and as a result, is robust to other aspects of the data. Laplace approximations are used to approximate the quasi-marginal likelihood when it is not available in closed form to provide computational tractability. We demonstrate through extensive simulation studies that the quasi-posterior improves variable selection accuracy across a range of data-generating scenarios, including linear models with heavy-tailed errors and overdispersed count data. We further illustrate the practical relevance of the proposed approach through applications to real datasets from social science and genomics
翻译:贝叶斯推断通过先验信念引入稀疏性并量化参数不确定性,为变量选择提供了强大的框架,从而产生具有良好有限样本性能的一致性程序。然而,准确量化不确定性需要正确指定的模型,且人们日益认识到模型误设对变量选择造成的问题。当前针对该问题的解决方案要么需要更复杂的模型(这削弱了原始变量选择任务的可解释性),要么通过脱离严格的贝叶斯不确定性量化来获得鲁棒性。本文建立了模型拟后验作为变量选择的原理性工具。我们证明模型拟后验具备完全贝叶斯变量选择的诸多理想性质,但不再需要完整的似然函数设定。相反,拟后验仅需设定均值和方差函数,因而对数据的其他方面具有鲁棒性。当拟边缘似然无法以闭合形式获得时,采用拉普拉斯近似进行逼近以保证计算可行性。通过大量模拟研究,我们证明拟后验能在多种数据生成场景(包括具有重尾误差的线性模型和过度离散计数数据)中提升变量选择准确性。我们进一步通过社会科学与基因组学真实数据集的应用,阐明了所提方法的实际意义。