Motivated by a real-world application in cardiology, we develop an algorithm to perform Bayesian bi-level variable selection in a generalized linear model, for datasets that may be large both in terms of the number of individuals and the number of predictors. Our algorithm relies on the waste-free SMC Sequential Monte Carlo methodology of Dau and Chopin (2022), a new proposal mechanism to deal with the constraints specific to bi-level selection (which forbid to select an individual predictor if its group is not selected), and the ALA (approximate Laplace approximation) approach of Rossell et al. (2021). We show in our numerical study that the algorithm may offer reliable performance on large datasets within a few minutes, on both simulated data and real data related to the aforementioned cardiology application.
翻译:受心脏病学实际应用启发,我们开发了一种算法,用于在广义线性模型中执行贝叶斯双层变量选择,适用于个体数量与预测变量数量均可能较大的数据集。该算法基于Dau与Chopin(2022)提出的无浪费SMC序列蒙特卡洛方法、针对双层选择特有约束(禁止在组未被选中时选择组内单个预测变量)的新提议机制,以及Rossell等人(2021)的ALA(近似拉普拉斯近似)方法。数值研究表明,该算法能在数分钟内为大规模数据集(包括模拟数据及前述心脏病学应用相关的真实数据)提供可靠性能。