In epidemiological and clinical studies, identifying patients' phenotypes based on longitudinal profiles is critical to understanding the disease's developmental patterns. The current study was motivated by data from a Canadian birth cohort study, the CHILD Cohort Study. Our goal was to use multiple longitudinal respiratory traits to cluster the participants into subgroups with similar longitudinal respiratory profiles in order to identify clinically relevant disease phenotypes. To appropriately account for distinct structures and types of these longitudinal markers, we proposed a novel joint model for clustering mixed-type (continuous, discrete and categorical) multivariate longitudinal data. We also developed a Markov Chain Monte Carlo algorithm to estimate the posterior distribution of model parameters. Analysis of the CHILD Cohort data and simulated data were presented and discussed. Our study demonstrated that the proposed model serves as a useful analytical tool for clustering multivariate mixed-type longitudinal data. We developed an R package BCClong to implement the proposed model efficiently.
翻译:在流行病学和临床研究中,基于纵向轨迹识别患者表型对于理解疾病发展模式至关重要。本研究源于加拿大出生队列研究——CHILD队列研究。我们的目标是利用多个纵向呼吸特征将参与者聚类为具有相似纵向呼吸轨迹的亚组,以识别临床相关的疾病表型。为恰当处理这些纵向标记物的不同结构和类型,我们提出了一种用于混合类型(连续型、离散型和分类型)多元纵向数据聚类的新型联合模型。我们还开发了马尔可夫链蒙特卡洛算法来估计模型参数的后验分布。对CHILD队列数据及模拟数据的分析结果进行了展示与讨论。研究表明,所提出的模型可作为聚类多元混合类型纵向数据的有效分析工具。我们开发了R包BCClong以高效实现该模型。