We introduce a symmetric random scan Gibbs sampler for scalable Bayesian variable selection that eliminates storage of the full cross-product matrix by computing required quantities on-the-fly. Data-informed proposal weights, constructed from marginal correlations, concentrate sampling effort on promising candidates while a uniform mixing component ensures theoretical validity. We provide explicit guidance for selecting tuning parameters based on the ratio of signal to null correlations, ensuring adequate posterior exploration. The posterior-mean-size selection rule provides an adaptive alternative to the median probability model that automatically calibrates to the effective signal density without requiring an arbitrary threshold. In simulations with one hundred thousand predictors, the method achieves sensitivity of 1.000 and precision above 0.76. Application to a genomic dataset studying riboflavin production in Bacillus subtilis identifies six genes, all validated by previous studies using alternative methods. The underlying model combines a Dirac spike-and-slab prior with Laplace-type shrinkage: the Dirac spike enforces exact sparsity by setting inactive coefficients to precisely zero, while the Laplace-type slab provides adaptive regularization for active coefficients through a local-global scale mixture.
翻译:本文提出一种对称随机扫描吉布斯采样器,用于可扩展的贝叶斯变量选择。该方法通过实时计算所需量,避免了存储完整叉积矩阵。基于边际相关性构建的数据驱动提案权重将采样重点集中在有潜力的候选变量上,同时均匀混合分量确保了理论有效性。我们根据信号与零相关性的比值为调参选择提供了明确指导,以保证充分的后验探索。后验均值规模选择规则为中位数概率模型提供了一种自适应替代方案,能自动校准至有效信号密度,无需依赖任意阈值。在包含十万个预测变量的模拟实验中,该方法实现了1.000的灵敏度与高于0.76的精确度。在针对枯草芽孢杆菌核黄素生产的基因组数据应用中,该方法识别出六个基因,均已被先前使用不同方法的研究所验证。基础模型结合了狄拉克尖峰-平板先验与拉普拉斯型收缩:狄拉克尖峰通过将非活跃系数精确设为零来实施严格稀疏性,而拉普拉斯型平板则通过局部-全局尺度混合为活跃系数提供自适应正则化。