Identifying and characterizing relationships between treatments, exposures, or other covariates and time-to-event outcomes has great significance in a wide range of biomedical settings. In research areas such as multi-center clinical trials, recurrent events, and genetic studies, proportional hazard mixed effects models (PHMMs) are used to account for correlations observed in clusters within the data. In high dimensions, proper specification of the fixed and random effects within PHMMs is difficult and computationally complex. In this paper, we approximate the proportional hazards mixed effects model with a piecewise constant hazard mixed effects survival model. We estimate the model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We also incorporate a factor model decomposition of the random effects in order to more easily scale the variable selection method to larger dimensions. We demonstrate the utility of our method using simulations, and we apply our method to a multi-study pancreatic ductal adenocarcinoma gene expression dataset to select features important for survival.
翻译:识别并刻画治疗、暴露或其他协变量与时间-事件结局之间的关系,在广泛的生物医学场景中具有重要意义。在多中心临床试验、复发事件及遗传学等研究领域,比例风险混合效应模型被用于处理数据中聚类结构所呈现的相关性。在高维情形下,正确设定比例风险混合效应模型中的固定效应与随机效应较为困难且计算复杂。本文采用分段常数风险混合效应生存模型对比例风险混合效应模型进行近似。我们通过改进的蒙特卡洛期望条件最小化算法估计模型参数,从而实现对固定效应与随机效应的同步变量选择。此外,我们引入随机效应的因子模型分解,以更便捷地将变量选择方法扩展至更高维度。我们通过模拟实验验证了所提方法的有效性,并将其应用于多研究胰腺导管腺癌基因表达数据集,以筛选对生存具有重要影响的特征。