An important task in health research is to characterize time-to-event outcomes such as disease onset or mortality in terms of a potentially high-dimensional set of risk factors. For example, prospective cohort studies of Alzheimer's disease typically enroll older adults for observation over several decades to assess the long-term impact of genetic and other factors on cognitive decline and mortality. The accelerated failure time model is particularly well-suited to such studies, structuring covariate effects as `horizontal' changes to the survival quantiles that conceptually reflect shifts in the outcome distribution due to lifelong exposures. However, this modeling task is complicated by the enrollment of adults at differing ages, and intermittent followup visits leading to interval censored outcome information. Moreover, genetic and clinical risk factors are not only high-dimensional, but characterized by underlying grouping structure, such as by function or gene location. Such grouped high-dimensional covariates require shrinkage methods that directly acknowledge this structure to facilitate variable selection and estimation. In this paper, we address these considerations directly by proposing a Bayesian accelerated failure time model with a group-structured lasso penalty, designed for left-truncated and interval-censored time-to-event data. We develop a custom Markov chain Monte Carlo sampler for efficient estimation, and investigate the impact of various methods of penalty tuning and thresholding for variable selection. We present a simulation study examining the performance of this method relative to models with an ordinary lasso penalty, and apply the proposed method to identify groups of predictive genetic and clinical risk factors for Alzheimer's disease in the Religious Orders Study and Memory and Aging Project (ROSMAP) prospective cohort studies of AD and dementia.
翻译:健康研究的一项重要任务是利用潜在高维风险因素集合刻画疾病发病或死亡等时间事件结局。例如,阿尔茨海默病前瞻性队列研究通常招募老年人进行长达数十年的观察,以评估遗传因素及其他因素对认知衰退和死亡的长期影响。加速失效时间模型特别适用于此类研究,它将协变量效应构建为生存分位数的"水平"变化,从概念上反映终身暴露导致的结果分布偏移。然而,由于招募的成年人年龄各异,以及间歇性随访导致的结局信息区间删失,使得这一建模任务复杂化。此外,遗传和临床风险因素不仅具有高维特征,还呈现潜在分组结构(如按功能或基因位置分组)。这种分组高维协变量需要能够直接识别该结构的收缩方法,以促进变量选择和估计。本文直接针对这些考量,提出一种具有分组结构lasso惩罚项的贝叶斯加速失效时间模型,专门针对左截断区间删失时间事件数据设计。我们开发了定制化的马尔可夫链蒙特卡洛采样器以实现高效估计,并研究了不同惩罚参数调优方法与变量选择阈值方法的影响。通过模拟研究比较该方法与普通lasso惩罚模型的性能,并将所提方法应用于宗教秩序研究与记忆衰老项目(ROSMAP)这一阿尔茨海默病及痴呆症前瞻性队列研究,以识别具有预测性的遗传和临床风险因素群组。