High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. However, there are few statistical methods for high-dimensional linear mixed models (LMMs), as most Bayesian variable selection or penalization methods are designed for independent observations. Additionally, the few available software packages for high-dimensional LMMs suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time.
翻译:高维纵向数据在广泛的科学研究中应用日益增多。然而,针对高维线性混合模型(LMMs)的统计方法较为匮乏,因为大多数贝叶斯变量选择或惩罚方法均针对独立观测数据设计。此外,现有的少数高维LMMs软件包存在可扩展性问题。本文提出了一种高效且准确的贝叶斯框架用于高维LMMs。我们采用超参数的经验贝叶斯估计量以增强灵活性,并利用期望-条件最小化(ECM)算法实现参数的计算高效最大后验概率(MAP)估计。该方法的创新之处在于其分区与参数扩展策略,以及快速可扩展的计算能力。我们通过模拟研究评估固定效应与随机效应估计及计算时间,展示了基于分区经验贝叶斯ECM的线性混合模型(LMM-PROBE)的性能。同时,利用儿童狼疮研究数据提供实际案例,识别与新型狼疮生物标志物相关的基因及临床因素,并对该生物标志物随时间变化的趋势进行预测。