High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. To properly account for dependence between longitudinal observations, statistical methods for high-dimensional linear mixed models (LMMs) have been developed. However, few packages implementing these high-dimensional LMMs are available in the statistical software R. Additionally, some packages suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time. Supplementary materials are available online.
翻译:高维纵向数据正日益广泛地应用于各类科学研究。为恰当处理纵向观测值间的依赖关系,针对高维线性混合模型(LMMs)的统计方法已得到发展。然而,在统计软件R中实现这些高维LMMs的可用程序包较少,且部分程序包存在可扩展性问题。本研究提出了一种高效且精确的高维LMMs贝叶斯框架。我们采用超参数的经验贝叶斯估计器以增强灵活性,并利用期望条件最小化(ECM)算法实现计算高效的最大后验概率(MAP)参数估计。该方法的创新性在于其分区与参数扩展策略以及快速可扩展的计算特性。我们通过模拟研究展示了基于分区经验贝叶斯ECM的线性混合建模方法(LMM-PROBE),评估了固定效应与随机效应估计效果及计算时间。同时提供了儿童狼疮研究数据的实际案例,通过该方法识别了与新型狼疮生物标志物相关的基因及临床因素,并实现了该生物标志物的时序预测。补充材料已在线发布。