The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
翻译:医疗诊断技术的快速发展对临床医生提出了更高要求,需要其处理并整合常规诊疗过程中产生的异质性但互补的数据。例如,单个癌症患者的个性化诊断与治疗方案制定依赖于多种影像数据(如放射学图像、病理图像及内镜图像)与非影像数据(如临床数据和基因组数据)。然而,此类决策过程往往具有主观性、定性特征,且存在较大的个体间差异。随着多模态深度学习技术的进步,越来越多的研究聚焦于一个核心问题:如何提取并聚合多模态信息,以最终实现更客观、量化的计算机辅助临床决策?本文系统回顾了该领域的最新研究成果,重点涵盖:(1)当前多模态学习工作流程的概述;(2)多模态融合方法的分类总结;(3)性能表现的讨论;(4)在疾病诊断与预后中的应用;(5)面临的挑战与未来发展方向。