Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first that to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports multiple distributional families and penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study.
翻译:广义线性混合模型(GLMMs)因其能够对具有非高斯条件分布的相关结果进行建模,在研究中被广泛应用。固定效应和随机效应的恰当选择是建模过程的关键环节,模型设定错误可能导致显著偏差。然而,由于传统上使用基于准则的模型选择策略,固定效应和随机效应的联合选择长期局限于低维GLMMs。本文介绍的R软件包glmmPen是首批采用惩罚GLMM建模框架在高维条件下选择固定效应和随机效应的工具之一。模型参数通过蒙特卡洛期望条件最小化(MCECM)算法进行估计,该算法利用Stan和RcppArmadillo提升计算效率。本软件包支持多种分布族和惩罚函数。本文通过胰腺癌亚型分类研究的应用案例,讨论了建模流程、估计方案及软件实现方法。