Medical image segmentation is a vital healthcare endeavor requiring precise and efficient models for appropriate diagnosis and treatment. Vision transformer-based segmentation models have shown great performance in accomplishing this task. However, to build a powerful backbone, the self-attention block of ViT requires large-scale pre-training data. The present method of modifying pre-trained models entails updating all or some of the backbone parameters. This paper proposes a novel fine-tuning strategy for adapting a pretrained transformer-based segmentation model on data from a new medical center. This method introduces a small number of learnable parameters, termed prompts, into the input space (less than 1\% of model parameters) while keeping the rest of the model parameters frozen. Extensive studies employing data from new unseen medical centers show that prompts-based fine-tuning of medical segmentation models provides excellent performance on the new center data with a negligible drop on the old centers. Additionally, our strategy delivers great accuracy with minimum re-training on new center data, significantly decreasing the computational and time costs of fine-tuning pre-trained models.
翻译:医学图像分割是一项关键的医疗任务,需要精确且高效的模型以支持准确的诊断和治疗。基于视觉Transformer的分割模型在完成此任务中表现出色。然而,为了构建强大的骨干网络,ViT的自注意力模块需要大规模预训练数据。当前修改预训练模型的方法涉及更新全部或部分骨干网络参数。本文提出了一种新颖的微调策略,用于将预训练的基于Transformer的分割模型适应于来自新医疗中心的数据。该方法在输入空间中引入少量可学习参数(称为提示),其数量少于模型参数的1%,同时保持其余模型参数冻结。使用来自未见过的医疗中心数据进行的大量研究表明,基于提示的医学分割模型微调在新中心数据上表现出色,同时对旧中心数据的性能下降可忽略不计。此外,我们的策略在新中心数据上以最少的重新训练实现了高精度,显著降低了微调预训练模型的计算和时间成本。