The successful adaptation of foundation models to multi-modal medical imaging is a critical yet unresolved challenge. Existing models often struggle to effectively fuse information from multiple sources and adapt to the heterogeneous nature of pathological tissues. To address this, we introduce a novel framework for adapting foundation models to multi-modal medical imaging, featuring two key technical innovations: sub-region-aware modality attention and adaptive prompt engineering. The attention mechanism enables the model to learn the optimal combination of modalities for each tumor sub-region, while the adaptive prompting strategy leverages the inherent capabilities of foundation models to refine segmentation accuracy. We validate our framework on the BraTS 2020 brain tumor segmentation dataset, demonstrating that our approach significantly outperforms baseline methods, particularly in the challenging necrotic core sub-region. Our work provides a principled and effective approach to multi-modal fusion and prompting, paving the way for more accurate and robust foundation model-based solutions in medical imaging.
翻译:将基础模型成功适配于多模态医学影像是一个关键但尚未解决的挑战。现有模型往往难以有效融合多源信息并适应病理组织的异质性。为此,我们提出了一种将基础模型适配于多模态医学影像的新框架,其包含两项关键技术创新:子区域感知模态注意力与自适应提示工程。该注意力机制使模型能够学习每个肿瘤子区域的最优模态组合,而自适应提示策略则利用基础模型的内在能力来提升分割精度。我们在BraTS 2020脑肿瘤分割数据集上验证了本框架,结果表明我们的方法显著优于基线方法,尤其在具有挑战性的坏死核心子区域。本研究为多模态融合与提示提供了原理清晰且有效的途径,为医学影像中更精准、更鲁棒的基础模型解决方案铺平了道路。