Models with random effects, such as generalised linear mixed models (GLMMs), are often used for analysing clustered data. Parameter inference with these models is difficult because of the presence of cluster-specific random effects, which must be integrated out when evaluating the likelihood function. Here, we propose a sequential variational Bayes algorithm, called Recursive Variational Gaussian Approximation for Latent variable models (R-VGAL), for estimating parameters in GLMMs. The R-VGAL algorithm operates on the data sequentially, requires only a single pass through the data, and can provide parameter updates as new data are collected without the need of re-processing the previous data. At each update, the R-VGAL algorithm requires the gradient and Hessian of a "partial" log-likelihood function evaluated at the new observation, which are generally not available in closed form for GLMMs. To circumvent this issue, we propose using an importance-sampling-based approach for estimating the gradient and Hessian via Fisher's and Louis' identities. We find that R-VGAL can be unstable when traversing the first few data points, but that this issue can be mitigated by using a variant of variational tempering in the initial steps of the algorithm. Through illustrations on both simulated and real datasets, we show that R-VGAL provides good approximations to the exact posterior distributions, that it can be made robust through tempering, and that it is computationally efficient.
翻译:含随机效应的模型(如广义线性混合模型,GLMM)常用于分析聚类数据。由于存在需通过似然函数积分消去的簇特定随机效应,此类模型的参数推断较为困难。本文提出一种序贯变分贝叶斯算法——递归变分高斯近似潜变量模型(R-VGAL),用于估计GLMM参数。R-VGAL算法可序贯处理数据,仅需一次数据遍历,并能在无需重新处理历史数据的情况下随新数据收集更新参数。在每次更新中,R-VGAL需计算新观测值的“局部”对数似然函数的梯度与海森矩阵,而GLMM中这些量通常无闭式解。为解决此问题,我们提出基于重要性采样的方法,通过费希尔恒等式与路易斯恒等式估计梯度与海森矩阵。研究发现R-VGAL在处理初始数据点时可能不稳定,但可通过在算法初始步骤中引入变分退火变体缓解该问题。基于模拟数据集与真实数据集的验证表明:R-VGAL能够良好逼近精确后验分布,通过退火可实现鲁棒性,且计算效率较高。