Adaptive MCMC for Bayesian variable selection in generalised linear models and survival models

Developing an efficient computational scheme for high-dimensional Bayesian variable selection in generalised linear models and survival models has always been a challenging problem due to the absence of closed-form solutions for the marginal likelihood. The RJMCMC approach can be employed to samples model and coefficients jointly, but effective design of the transdimensional jumps of RJMCMC can be challenge, making it hard to implement. Alternatively, the marginal likelihood can be derived using data-augmentation scheme e.g. Polya-gamma data argumentation for logistic regression) or through other estimation methods. However, suitable data-augmentation schemes are not available for every generalised linear and survival models, and using estimations such as Laplace approximation or correlated pseudo-marginal to derive marginal likelihood within a locally informed proposal can be computationally expensive in the "large n, large p" settings. In this paper, three main contributions are presented. Firstly, we present an extended Point-wise implementation of Adaptive Random Neighbourhood Informed proposal (PARNI) to efficiently sample models directly from the marginal posterior distribution in both generalised linear models and survival models. Secondly, in the light of the approximate Laplace approximation, we also describe an efficient and accurate estimation method for the marginal likelihood which involves adaptive parameters. Additionally, we describe a new method to adapt the algorithmic tuning parameters of the PARNI proposal by replacing the Rao-Blackwellised estimates with the combination of a warm-start estimate and an ergodic average. We present numerous numerical results from simulated data and 8 high-dimensional gene fine mapping data-sets to showcase the efficiency of the novel PARNI proposal compared to the baseline add-delete-swap proposal.

翻译：为广义线性模型和生存模型中的高维贝叶斯变量选择开发高效计算方案一直是一个具有挑战性的问题，原因在于边际似然缺乏闭合形式的解。RJMCMC方法可用于联合采样模型和系数，但其跨维度跳跃的有效设计较为困难，导致实现不易。另一种方案是，通过数据增广技术（例如逻辑回归的Polya-gamma数据增广）或其他估计方法推导边际似然。然而，并非所有广义线性模型和生存模型都适用合适的数据增广方案，而在“大n、大p”场景下，利用拉普拉斯近似或相关伪边际估计等方法来推导基于局部信息提议的边际似然，其计算成本可能很高。本文提出三项主要贡献。首先，我们扩展了一种自适应随机邻域信息提议（PARNI）的逐点实现方式，以高效地从广义线性模型和生存模型的边际后验分布中直接采样模型。其次，基于近似拉普拉斯方法，我们描述了一种涉及自适应参数的高效且准确的边际似然估计方法。此外，我们提出了一种新方法，通过将Rao-Blackwellised估计替换为热启动估计与遍历平均的组合，来调整PARNI提议的算法参数。我们展示了来自模拟数据和8个高维基因精细定位数据集的众多数值结果，以证明新型PARNI提议相比基线增加-删除-交换提议的高效性。