Models with random effects, such as generalised linear mixed models (GLMMs), are often used for analysing clustered data. Parameter inference with these models is difficult because of the presence of cluster-specific random effects, which must be integrated out when evaluating the likelihood function. Here, we propose a sequential variational Bayes algorithm, called Recursive Variational Gaussian Approximation for Latent variable models (R-VGAL), for estimating parameters in GLMMs. The R-VGAL algorithm operates on the data sequentially, requires only a single pass through the data, and can provide parameter updates as new data are collected without the need of re-processing the previous data. At each update, the R-VGAL algorithm requires the gradient and Hessian of a "partial" log-likelihood function evaluated at the new observation, which are generally not available in closed form for GLMMs. To circumvent this issue, we propose using an importance-sampling-based approach for estimating the gradient and Hessian via Fisher's and Louis' identities. We find that R-VGAL can be unstable when traversing the first few data points, but that this issue can be mitigated by using a variant of variational tempering in the initial steps of the algorithm. Through illustrations on both simulated and real datasets, we show that R-VGAL provides good approximations to the exact posterior distributions, that it can be made robust through tempering, and that it is computationally efficient.
翻译:带有随机效应的模型(如广义线性混合模型,GLMM)常用于分析聚类数据。由于存在聚类特定随机效应,在评估似然函数时必须对其进行积分,因此这类模型的参数推断较为困难。本文提出一种序贯变分贝叶斯算法——递归变分高斯近似隐变量模型(R-VGAL),用于估计GLMM中的参数。R-VGAL算法对数据进行序贯处理,仅需单次遍历数据,并能在收集新数据时更新参数,无需重新处理先前数据。在每次更新时,R-VGAL算法需计算新观测值处“部分”对数似然函数的梯度与海森矩阵,但对于GLMM而言,这些量通常无法以闭式解获得。为解决该问题,我们提出一种基于重要性采样的方法,通过Fisher恒等式和Louis恒等式估计梯度与海森矩阵。我们发现,R-VGAL在处理初始数据点时可能不稳定,但通过在算法初始步骤中采用变分退火变体可缓解此问题。基于模拟和真实数据集的实验表明,R-VGAL能良好逼近精确后验分布,通过退火策略可提升其鲁棒性,且计算效率较高。