Adaptive MCMC for Bayesian variable selection in generalised linear models and survival models

Developing an efficient computational scheme for high-dimensional Bayesian variable selection in generalised linear models and survival models has always been a challenging problem due to the absence of closed-form solutions for the marginal likelihood. The RJMCMC approach can be employed to samples model and coefficients jointly, but effective design of the transdimensional jumps of RJMCMC can be challenge, making it hard to implement. Alternatively, the marginal likelihood can be derived using data-augmentation scheme e.g. Polya-gamma data argumentation for logistic regression) or through other estimation methods. However, suitable data-augmentation schemes are not available for every generalised linear and survival models, and using estimations such as Laplace approximation or correlated pseudo-marginal to derive marginal likelihood within a locally informed proposal can be computationally expensive in the "large n, large p" settings. In this paper, three main contributions are presented. Firstly, we present an extended Point-wise implementation of Adaptive Random Neighbourhood Informed proposal (PARNI) to efficiently sample models directly from the marginal posterior distribution in both generalised linear models and survival models. Secondly, in the light of the approximate Laplace approximation, we also describe an efficient and accurate estimation method for the marginal likelihood which involves adaptive parameters. Additionally, we describe a new method to adapt the algorithmic tuning parameters of the PARNI proposal by replacing the Rao-Blackwellised estimates with the combination of a warm-start estimate and an ergodic average. We present numerous numerical results from simulated data and 8 high-dimensional gene fine mapping data-sets to showcase the efficiency of the novel PARNI proposal compared to the baseline add-delete-swap proposal.

翻译：开发针对广义线性模型与生存模型中高维贝叶斯变量选择的高效计算方案一直是一个具有挑战性的问题，原因在于边缘似然缺乏闭式解。RJMCMC方法可用于联合抽样模型与系数，但跨维度跳转的有效设计颇具挑战性，使其难以实施。另一种方法是通过数据增强方案（例如逻辑回归的Polya-gamma数据增强）或其他估计方法推导边缘似然。然而，并非所有广义线性模型与生存模型都具备适用的数据增强方案，且在"大n、大p"场景下，使用Laplace近似或相关伪边缘似然等估计方法在局部信息建议中推导边缘似然可能在计算上代价高昂。本文提出三大贡献：首先，我们提出一种扩展的自适应随机邻域信息提议（PARNI）逐点实现，可在广义线性模型与生存模型中直接从边缘后验分布高效抽样模型。其次，基于近似Laplace近似，我们描述了一种涉及自适应参数的高效且准确的边缘似然估计方法。此外，我们提出一种新方法，通过用热启动估计与遍历平均的组合替代Rao-Blackwellised估计，来调整PARNI提议的算法调优参数。我们通过模拟数据及8个高维基因精细定位数据集的大量数值结果，展示了新颖PARNI提议相较于基线增删交换提议的效率。