Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing. Existing TTA techniques rely on having multiple test images from the same domain, yet this may be impractical in real-world applications such as medical imaging, where data acquisition is expensive and imaging conditions vary frequently. Here, we approach such a task, of adapting a medical image segmentation model with only a single unlabeled test image. Most TTA approaches, which directly minimize the entropy of predictions, fail to improve performance significantly in this setting, in which we also observe the choice of batch normalization (BN) layer statistics to be a highly important yet unstable factor due to only having a single test domain example. To overcome this, we propose to instead integrate over predictions made with various estimates of target domain statistics between the training and test statistics, weighted based on their entropy statistics. Our method, validated on 24 source/target domain splits across 3 medical image datasets surpasses the leading method by 2.9% Dice coefficient on average.
翻译:测试时自适应(TTA)是指在测试阶段将训练好的模型适应到新领域。现有TTA技术依赖于同一领域具有多个测试图像,但在医学成像等实际应用中这一点可能不切实际,因为数据采集成本高昂且成像条件频繁变化。本文针对仅有一张无标注测试图像时的医学图像分割模型自适应任务展开研究。大多数通过直接最小化预测熵的TTA方法在此情境下难以显著提升性能,同时我们观察到,由于仅有一个测试领域样本,批归一化(BN)层统计量的选择成为高度重要且不稳定的因素。为解决此问题,我们提出对基于各种目标领域统计量估计的预测进行集成,这些估计介于训练统计量与测试统计量之间,并根据其熵统计量进行加权。我们的方法在3个医学图像数据集的24个源/目标领域划分上经过验证,平均Dice系数超过领先方法2.9%。