Medical image segmentation is a vital healthcare endeavor requiring precise and efficient models for appropriate diagnosis and treatment. Vision transformer (ViT)-based segmentation models have shown great performance in accomplishing this task. However, to build a powerful backbone, the self-attention block of ViT requires large-scale pre-training data. The present method of modifying pre-trained models entails updating all or some of the backbone parameters. This paper proposes a novel fine-tuning strategy for adapting a pretrained transformer-based segmentation model on data from a new medical center. This method introduces a small number of learnable parameters, termed prompts, into the input space (less than 1\% of model parameters) while keeping the rest of the model parameters frozen. Extensive studies employing data from new unseen medical centers show that the prompt-based fine-tuning of medical segmentation models provides excellent performance regarding the new-center data with a negligible drop regarding the old centers. Additionally, our strategy delivers great accuracy with minimum re-training on new-center data, significantly decreasing the computational and time costs of fine-tuning pre-trained models.
翻译:医学图像分割是一项至关重要的医疗任务,需要精确且高效的模型来实现正确的诊断与治疗。基于视觉Transformer(ViT)的分割模型在完成该任务中展现了优异性能。然而,为构建强大的骨干网络,ViT的自注意力模块需要大规模预训练数据。当前修改预训练模型的方法需要更新全部或部分骨干网络参数。本文提出了一种新颖的微调策略,用于将预训练的Transformer分割模型适配到新医疗中心的数据上。该方法在输入空间中引入少量可学习参数(称为提示),其数量不足模型参数的1%,同时冻结其余模型参数。针对从未见过的新医疗中心数据进行的广泛研究表明,基于提示的医学分割模型微调方法在新中心数据上表现出色,同时仅在旧中心数据上产生可忽略的性能下降。此外,该策略以最小的重训练代价在新中心数据上实现了高精度,显著降低了微调预训练模型的计算成本和时间成本。