We consider the problem of scalable sampling algorithms to fit Bayesian generalized linear mixed models on large datasets. Stochastic gradient Langevin dynamics, coupled with smooth re-parameterizations of variance parameters, produces divergent Markov chains and cannot be reliably used for sampling covariance parameters of random effects. We advocate the use of a mirror Langevin dynamics algorithm, propose the novel stochastic mirror Langevin dynamics based on data subsampling, and provide concrete guidelines for its use in a Bayesian inference framework. Based on an explicit Wasserstein distance error bound between the posterior and its algorithmic approximation, we propose a post-processing step that yields an asymptotic, order-wise correct estimation of the posterior variance, eliminating the irreducible posterior variance estimation bias due to subsampling. Empirical performance of the method is evaluated through simulated experiments and a longitudinal study of pain trajectories in a study of breast cancer survivors.
翻译:我们考虑在大规模数据集上拟合贝叶斯广义线性混合模型的可扩展抽样算法问题。将随机梯度朗之万动力学与方差参数的平滑重参数化结合使用时,会产生发散的马尔可夫链,因此该方法无法可靠地用于随机效应协方差参数的抽样。我们主张采用镜像朗之万动力学算法,提出基于数据子抽样的新型随机镜像朗之万动力学方法,并为其在贝叶斯推断框架中的应用提供具体指导。基于后验分布与其算法近似之间的显式Wasserstein距离误差界,我们提出后处理步骤,从而渐进地获得后验方差的最优阶正确估计,消除了因子抽样导致的不可约后验方差估计偏差。通过模拟实验及一项针对乳腺癌幸存者的疼痛轨迹纵向研究,我们评估了该方法的实证性能。