Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data. Although transfer learning from natural to medical images has been explored for image-only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated $11$ datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demonstrate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional segmentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts. The code and datasets are available at https://github.com/naamiinepal/medvlsm.

翻译：医学图像分割能够量化目标结构的大小与形状，有助于疾病诊断、预后评估、手术规划及病理理解。基于近期从自然图像-文本对中构建的基础视觉-语言模型（VLMs）的进展，多项研究提出将其适配为视觉-语言分割模型（VLSMs），使得语言文本可作为分割模型的附加输入。在推理过程中通过人工参与提示的文本引入辅助信息，开辟了独特的应用前景，例如开放词汇分割以及可能对分布外数据具有更强鲁棒性的分割模型。尽管从自然图像到医学图像的迁移学习已在纯图像分割模型中得到探索，但视觉-语言联合表示在分割问题中的应用仍研究不足。本研究首次系统性地探索了将VLSMs迁移至二维医学图像的任务，通过精心构建的$11$个涵盖多种模态的数据集，结合具有洞察力的语言提示与实验设计。研究结果表明：虽然在有限医学图像数据集上微调后，VLSMs相较于纯图像模型展现出具有竞争力的分割性能，但并非所有VLSMs都能有效利用语言提示提供的附加信息，图像特征仍占据主导地位。尽管VLSMs在处理多模态混合数据集时表现出更强的性能，且相较于传统分割模型可能对领域偏移具有更好的鲁棒性，但我们的研究指出仍需开发新方法以使VLSMs能够充分利用语言提示所提供的多样化辅助信息。代码与数据集已公开于https://github.com/naamiinepal/medvlsm。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日