The application of machine learning in medicine and healthcare has led to the creation of numerous diagnostic and prognostic models. However, despite their success, current approaches generally issue predictions using data from a single modality. This stands in stark contrast with clinician decision-making which employs diverse information from multiple sources. While several multimodal machine learning approaches exist, significant challenges in developing multimodal systems remain that are hindering clinical adoption. In this paper, we introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning. AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies. In an illustrative application using a multimodal skin lesion dataset, we highlight the importance of multimodal machine learning and the power of combining multiple fusion strategies using ensemble learning. We have open-sourced our framework as a tool for the community and hope it will accelerate the uptake of multimodal machine learning in healthcare and spur further innovation.
翻译:机器学习在医学与健康领域的应用催生了大量诊断与预后模型。然而,尽管现有方法取得了成功,它们通常仅基于单一模态的数据进行预测。这与临床医生的决策过程形成鲜明对比,后者综合运用来自多源的不同信息。虽然已有多种多模态机器学习方法,但开发多模态系统仍面临重大挑战,阻碍了其临床采纳。本文提出一种多模态框架AutoPrognosis-M,该框架通过自动化机器学习实现了结构化临床(表格)数据与医学影像的整合。AutoPrognosis-M包含17种影像模型(包括卷积神经网络与视觉Transformer)及三种不同的多模态融合策略。通过在多模态皮肤病变数据集上的示例应用,我们阐明了多模态机器学习的重要性,以及通过集成学习结合多种融合策略的优势。我们已将框架开源作为社区工具,期望能加速多模态机器学习在医疗健康领域的应用,并推动进一步创新。