Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.
翻译:现代生物医学数据集日益呈现高维特性,并展现出复杂的相关结构。广义线性混合模型(GLMM)长期以来被用于解释此类依赖关系。然而,在高维场景下,GLMM中固定效应与随机效应的合理设定愈发困难,且计算复杂度随随机效应维度的增加而增长。我们提出了一种新的GLMM重构方法,通过随机效应的因子模型分解,将潜在空间从大量随机效应压缩至较少的潜在因子,从而支持高维GLMM的可扩展计算。我们同时扩展了先前工作,采用改进的蒙特卡洛期望条件最小化算法估计模型参数,实现了对固定效应与随机效应的同步变量选择。模拟研究表明,通过这种因子模型分解,我们的方法能够比现有可比方法更快地拟合高维惩罚GLMM,并更易扩展到现有方法尚未涉及的高维场景。