Medical Multimodal-Multitask Foundation Model for Superior Chest CT Performance

Patient management requires multitasking interaction with multimodal data. While today's AI, particularly large foundation models, promises unprecedented opportunities, progress remains relatively slow in developing medical multimodal multitask foundation models. There are two main challenges along this direction: the data challenge -- the high bar to curate medical multimodal multitask datasets including 3D medical tomographic images in alignment with other clinical datasets, and the model challenge -- the unavailability of a scalable and adaptable foundation model architecture to synergize multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with an emphasis on lung cancer screening. To train our M3FM, we first curated a comprehensive multimodal multitask dataset consisting of 163,725 3D chest CT exams, 48 clinical data types, and 17 medical tasks on lung, heart, and other chest diseases. Then, we created and applied a multimodal question-answering framework as a unified training strategy to effectively integrate multimodal information and naturally perform multiple tasks with free-text prompting. Extensive experimental results demonstrate that M3FM consistently outperforms the previous state-of-the-art models. M3FM can identify informative multimodal data elements that are relevant to specific clinical tasks, being instrumental in building AI models and gaining insights into correlations among multimodal data and diseases. M3FM can be adapted to boost the performance of new tasks with a small out-of-distribution dataset. M3FM has enabled superior volumetric CT imaging performance for lung cancer screening, cardiac disease prediction, and other CT-related tasks. M3FM can be extended to incorporate more data types and improve other medical tasks, towards AI-empowered precise and efficient medicine.

翻译：患者管理需要与多模态数据进行多任务交互。尽管当前人工智能，特别是大型基础模型带来了前所未有的机遇，但在开发医学多模态多任务基础模型方面进展仍相对缓慢。这一方向存在两大主要挑战：数据挑战——构建涵盖3D医学断层影像并与其他临床数据集对齐的医学多模态多任务数据集门槛较高；以及模型挑战——缺乏可扩展且可适应的基础模型架构来协同多模态数据集以应对多样化临床任务。在此，我们提出首个以肺癌筛查为重点的医学多模态多任务基础模型（M3FM）。为训练M3FM，我们首先构建了一个包含163,725次3D胸部CT检查、48种临床数据类型及涉及肺、心脏及其他胸部疾病的17项医学任务的综合多模态多任务数据集。随后，我们创建并应用了多模态问答框架作为统一训练策略，以有效整合多模态信息，并通过自由文本提示自然执行多项任务。大量实验结果表明，M3FM在性能上持续优于先前最先进的模型。M3FM能够识别与特定临床任务相关的信息性多模态数据元素，这对构建AI模型及洞察多模态数据与疾病之间的关联具有重要价值。通过少量分布外数据集，M3FM可被适应以提升新任务的性能。M3FM在肺癌筛查、心脏疾病预测及其他CT相关任务中实现了卓越的容积CT成像性能。该模型可扩展至整合更多数据类型并改进其他医学任务，从而推动实现AI赋能精准高效医疗。