DELTA: degradation-free fully test-time adaptation

Fully test-time adaptation aims at adapting a pre-trained model to the test stream during real-time inference, which is urgently required when the test distribution differs from the training distribution. Several efforts have been devoted to improving adaptation performance. However, we find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning. First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates. Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes. In addition to the extensively studied test stream with independent and class-balanced samples, we further observe that the defects can be exacerbated in more complicated test environments, such as (time) dependent or class-imbalanced data. We observe that previous approaches work well in certain scenarios while show performance degradation in others due to their faults. In this paper, we provide a plug-in solution called DELTA for Degradation-freE fuLly Test-time Adaptation, which consists of two components: (i) Test-time Batch Renormalization (TBR), introduced to improve the estimated normalization statistics. (ii) Dynamic Online re-weighTing (DOT), designed to address the class bias within optimization. We investigate various test-time adaptation methods on three commonly used datasets with four scenarios, and a newly introduced real-world dataset. DELTA can help them deal with all scenarios simultaneously, leading to SOTA performance.

翻译：完全测试时自适应旨在实时推理过程中将预训练模型适应到测试数据流，当测试分布与训练分布不一致时该方法尤为迫切。已有研究致力于提升自适应性能。然而，我们发现当前流行的自适应方法（如测试时批归一化（BN）和自学习）中隐藏着两个不利缺陷。首先，我们揭示测试时BN中的归一化统计量完全受当前接收的测试样本影响，导致估计不准确。其次，我们证明在测试时自适应过程中，参数更新会偏向某些主导类别。除了已被广泛研究的独立且类别平衡的测试数据流，我们进一步观察到，在更复杂的测试环境（如时间依赖或类别不平衡数据）中这些缺陷可能加剧。我们注意到先前的方法在特定场景下表现良好，但由于其缺陷而在其他场景下出现性能退化。本文提出了一种即插即用解决方案DELTA（无退化完全测试时自适应），包含两个组件：(i) 测试时批重归一化（TBR），用于改进归一化统计量的估计；(ii) 动态在线重加权（DOT），用于解决优化中的类别偏差。我们在三个常用数据集上（涵盖四种场景）以及一个新引入的真实世界数据集中，系统研究了多种测试时自适应方法。DELTA能同时帮助这些方法处理所有场景，达到最先进的性能。