Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications. Test-time adaptation has proven its effectiveness in tackling the cross-domain distribution shift during inference. However, most existing methods achieve adaptation by updating the pre-trained models, rendering them susceptible to error accumulation and catastrophic forgetting when encountering a series of distribution shifts (i.e., under the continual test-time adaptation setup). To overcome these challenges caused by updating the models, in this paper, we freeze the pre-trained model and propose the Visual Prompt-based Test-Time Adaptation (VPTTA) method to train a specific prompt for each test image to align the statistics in the batch normalization layers. Specifically, we present the low-frequency prompt, which is lightweight with only a few parameters and can be effectively trained in a single iteration. To enhance prompt initialization, we equip VPTTA with a memory bank to benefit the current prompt from previous ones. Additionally, we design a warm-up mechanism, which mixes source and target statistics to construct warm-up statistics, thereby facilitating the training process. Extensive experiments demonstrate the superiority of our VPTTA over other state-of-the-art methods on two medical image segmentation benchmark tasks. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/VPTTA.
翻译:分布偏移广泛存在于不同医学中心获取的医学图像中,对预训练语义分割模型在真实场景中的部署构成重大障碍。测试时自适应已被证明能有效应对推理过程中的跨域分布偏移。然而,现有方法大多通过更新预训练模型实现自适应,这导致其在遭遇连续分布偏移(即持续测试时自适应设定)时容易产生误差累积和灾难性遗忘。为克服模型更新引发的这些挑战,本文冻结预训练模型,提出基于视觉提示的测试时自适应方法(VPTTA),为每张测试图像训练专属提示以对齐批归一化层的统计量。具体而言,我们提出低频提示——该提示参数量极少且可在单次迭代中高效训练。为优化提示初始化,我们为VPTTA配备记忆库,使当前提示能从先前提示中获益。此外,我们设计预热机制,通过混合源域与目标域统计量构建预热统计量,从而促进训练过程。大量实验表明,在两个医学图像分割基准任务上,我们的VPTTA优于其他最先进方法。预训练源模型的代码与权重已开源至https://github.com/Chen-Ziyang/VPTTA。