Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.
翻译:在现代科学应用中,包括基因组学、代谢组学和神经影像学,对同一组研究对象收集多种类型的数据已成为常见做法。联合与个体变异解释(JIVE)方法旨在对共同研究对象上采集的两组或多组特征之间的联合变异进行低秩近似,并将其与各组特征特有的变异区分开来。我们开发了一种期望最大化(EM)算法来估计JIVE框架的概率模型。该模型将概率主成分分析扩展至多个数据集。与其它方法相比,我们的最大似然方法能够同时估计联合成分和个体成分,从而可能获得更高的准确性。我们将ProJIVE应用于阿尔茨海默病的脑形态测量和认知能力指标。ProJIVE能够学习具有生物学意义的变异模式,且联合形态测量和认知受试者得分与更昂贵的现有生物标志物密切相关。本文准备过程中使用的数据来自阿尔茨海默病神经影像学倡议(ADNI)数据库。复现该分析所需的代码可在我们的GitHub页面获取。