Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research cohorts and are prohibitive to collect in routine clinical practice. Especially in studies of brain diseases, research cohorts may include both neuroimaging data and genetic data, but for practical clinical diagnosis, we often need to make disease predictions only based on neuroimages. As a result, it is desired to design machine learning models which can use all available data (different data could provide complementary information) during training but conduct inference using only the most common data modality. We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Experimental results demonstrate that our approach outperforms the related machine learning and deep learning methods by a significant margin.
翻译:近期多种脑部数据源采集技术的进展,为整合多模态脑数据以辅助复杂脑部疾病的早期检测创造了新机遇。然而,当前数据整合方法通常需要完备的生物医学数据模态集,这在实践中往往难以实现——某些模态数据仅存于大规模研究队列中,且在常规临床诊疗中难以采集。尤其在脑疾病研究中,研究队列可能同时包含神经影像数据与遗传数据,但面向临床诊断时往往仅能基于神经影像进行疾病预测。因此,亟需设计能够利用训练阶段所有可用数据(不同数据可提供互补信息)进行学习,但推理时仅依赖最普遍数据模态的机器学习模型。我们提出一种新型不完全多模态数据整合方法,通过采用Transformer与生成对抗网络,在训练阶段有效利用辅助模态数据以提升单模态模型在推理阶段的性能。该方法应用于阿尔茨海默病神经影像学计划(ADNI)队列的多模态影像遗传数据,进行认知衰退与疾病结局预测。实验结果表明,我们的方法在性能上显著优于相关机器学习与深度学习方法。