Multiple Sclerosis (MS) is a chronic disease developed in human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale (EDSS), composed of several functional sub-scores. Early and accurate classification of MS disease severity is critical for slowing down or preventing disease progression via applying early therapeutic intervention strategies. Recent advances in deep learning and the wide use of Electronic Health Records (EHR) creates opportunities to apply data-driven and predictive modeling tools for this goal. Previous studies focusing on using single-modal machine learning and deep learning algorithms were limited in terms of prediction accuracy due to the data insufficiency or model simplicity. In this paper, we proposed an idea of using patients' multimodal longitudinal and longitudinal EHR data to predict multiple sclerosis disease severity at the hospital visit. This work has two important contributions. First, we describe a pilot effort to leverage structured EHR data, neuroimaging data and clinical notes to build a multi-modal deep learning framework to predict patient's MS disease severity. The proposed pipeline demonstrates up to 25% increase in terms of the area under the Area Under the Receiver Operating Characteristic curve (AUROC) compared to models using single-modal data. Second, the study also provides insights regarding the amount useful signal embedded in each data modality with respect to MS disease prediction, which may improve data collection processes.
翻译:多发性硬化症(Multiple Sclerosis, MS)是一种发生于人脑和脊髓的慢性疾病,可导致神经永久性损伤或退化。MS疾病严重程度通过扩展残疾状态量表(Expanded Disability Status Scale, EDSS)监测,该量表由多个功能子评分构成。早期准确分类MS疾病严重程度对于通过早期治疗干预策略延缓或阻止疾病进展至关重要。近年来深度学习的进展与电子健康记录(Electronic Health Records, EHR)的广泛应用,为实现数据驱动预测性建模工具创造了机遇。此前采用单模态机器学习与深度学习算法的研究,因数据不足或模型简单性导致预测精度受限。本文提出利用患者多模态纵向EHR数据预测医院就诊时MS疾病严重程度的思路。本研究有两项重要贡献:其一,首次尝试整合结构化EHR数据、神经影像数据与临床笔记构建多模态深度学习框架,以预测患者MS疾病严重程度。与单模态数据模型相比,该框架在受试者工作特征曲线下面积(AUROC)上提升高达25%;其二,研究揭示了各数据模态中与MS疾病预测相关的有效信号量,这有助于优化数据采集流程。