OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values

Alzheimer's disease, a neurodegenerative disorder, is associated with neural, genetic, and proteomic factors while affecting multiple cognitive and behavioral faculties. Traditional AD prediction largely focuses on univariate disease outcomes, such as disease stages and severity. Multimodal data encode broader disease information than a single modality and may, therefore, improve disease prediction; but they often contain missing values. Recent "deeper" machine learning approaches show promise in improving prediction accuracy, yet the biological relevance of these models needs to be further charted. Integrating missing data analysis, predictive modeling, multimodal data analysis, and explainable AI, we propose OPTIMUS, a predictive, modular, and explainable machine learning framework, to unveil the many-to-many predictive pathways between multimodal input data and multivariate disease outcomes amidst missing values. OPTIMUS first applies modality-specific imputation to uncover data from each modality while optimizing overall prediction accuracy. It then maps multimodal biomarkers to multivariate outcomes using machine-learning and extracts biomarkers respectively predictive of each outcome. Finally, OPTIMUS incorporates XAI to explain the identified multimodal biomarkers. Using data from 346 cognitively normal subjects, 608 persons with mild cognitive impairment, and 251 AD patients, OPTIMUS identifies neural and transcriptomic signatures that jointly but differentially predict multivariate outcomes related to executive function, language, memory, and visuospatial function. Our work demonstrates the potential of building a predictive and biologically explainable machine-learning framework to uncover multimodal biomarkers that capture disease profiles across varying cognitive landscapes. The results improve our understanding of the complex many-to-many pathways in AD.

翻译：阿尔茨海默病作为一种神经退行性疾病，其发生与神经、遗传及蛋白质组学因素相关，同时影响多种认知与行为功能。传统的AD预测主要关注单变量疾病结局，如疾病分期与严重程度。多模态数据较单一模态能编码更广泛的疾病信息，因此可能提升疾病预测性能，但此类数据常存在缺失值。近期"更深度"的机器学习方法在提升预测精度方面展现出潜力，但这些模型的生物学相关性仍需进一步探索。通过整合缺失数据分析、预测建模、多模态数据分析和可解释人工智能技术，我们提出OPTIMUS——一个具备预测性、模块化与可解释性的机器学习框架，旨在揭示缺失值存在下多模态输入数据与多变量疾病结局之间的多对多预测路径。OPTIMUS首先采用模态特异性插补方法，在优化整体预测精度的同时还原各模态数据；随后通过机器学习将多模态生物标志物映射至多变量结局，并提取分别预测各结局的生物标志物；最后整合可解释人工智能技术对识别出的多模态生物标志物进行解释。基于346名认知正常受试者、608名轻度认知障碍患者和251名AD患者的数据，OPTIMUS识别出能共同但差异化预测执行功能、语言、记忆和视觉空间功能相关多变量结局的神经与转录组特征。本研究证明了构建兼具预测能力与生物学可解释性的机器学习框架的潜力，该框架能揭示捕捉不同认知维度疾病特征的多模态生物标志物。研究结果深化了我们对AD复杂多对多病理通路的理解。