From Question to Exploration: Test-Time Adaptation in Semantic Segmentation?

Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to test data with potential distribution shifts. Most existing TTA methods focus on classification problems. The pronounced success of classification might lead numerous newcomers and engineers to assume that classic TTA techniques can be directly applied to the more challenging task of semantic segmentation. However, this belief is still an open question. In this paper, we investigate the applicability of existing classic TTA strategies in semantic segmentation. Our comprehensive results have led to three key observations. First, the classic normalization updating strategy only brings slight performance improvement, and in some cases, it might even adversely affect the results. Even with the application of advanced distribution estimation techniques like batch renormalization, the problem remains unresolved. Second, although the teacher-student scheme does enhance the training stability for segmentation TTA in the presence of noisy pseudo-labels and temporal correlation, it cannot directly result in performance improvement compared to the original model without TTA under complex data distribution. Third, segmentation TTA suffers a severe long-tailed class-imbalance problem, which is substantially more complex than that in TTA for classification. This long-tailed challenge negatively affects segmentation TTA performance, even when the accuracy of pseudo-labels is high. Besides those observations, we find that visual prompt tuning (VisPT) is promising in segmentation TTA and propose a novel method named TTAP. The outstanding performance of TTAP has also been verified. We hope the community can give more attention to this challenging, yet important, segmentation TTA task in the future. The source code is available at: \textit{https://github.com/ycarobot/TTAP

翻译：测试时适应（TTA）旨在将最初在训练数据上训练的模型，适应到可能存在分布偏移的测试数据上。现有的大多数TTA方法都聚焦于分类问题。分类领域的显著成功可能使许多新进入者和工程师认为，经典的TTA技术可以直接应用于更具挑战性的语义分割任务。然而，这一观点仍是一个开放性问题。在本文中，我们研究了现有经典TTA策略在语义分割中的适用性。我们全面的实验结果得出了三个关键观察。首先，经典的归一化更新策略仅带来轻微的性能提升，在某些情况下甚至可能对结果产生不利影响。即使应用了诸如批量重归一化等先进的分布估计技术，该问题仍未得到解决。其次，尽管师生架构在存在噪声伪标签和时间相关性的情况下确实增强了分割TTA的训练稳定性，但在复杂数据分布下，与未进行TTA的原始模型相比，它并不能直接带来性能提升。第三，分割TTA遭受严重的类别长尾不平衡问题，这比分类任务中的TTA问题要复杂得多。即使伪标签的准确率很高，这种长尾挑战也会对分割TTA的性能产生负面影响。除了这些观察，我们发现视觉提示调优（VisPT）在分割TTA中具有前景，并提出了一种名为TTAP的新方法。TTAP的优异性能也已得到验证。我们希望未来学界能对这一具有挑战性但重要的分割TTA任务给予更多关注。源代码位于：\textit{https://github.com/ycarobot/TTAP}