Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting

Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can significantly improve the test performance on out-of-distribution data, they often suffer from severe performance degradation on in-distribution data after TTA (known as forgetting). To this end, we have proposed an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples for test-time entropy minimization. To alleviate forgetting, EATA introduces a Fisher regularizer estimated from test samples to constrain important model parameters from drastic changes. However, in EATA, the adopted entropy loss consistently assigns higher confidence to predictions even for samples that are underlying uncertain, leading to overconfident predictions. To tackle this, we further propose EATA with Calibration (EATA-C) to separately exploit the reducible model uncertainty and the inherent data uncertainty for calibrated TTA. Specifically, we measure the model uncertainty by the divergence between predictions from the full network and its sub-networks, on which we propose a divergence loss to encourage consistent predictions instead of overconfident ones. To further recalibrate prediction confidence, we utilize the disagreement among predicted labels as an indicator of the data uncertainty, and then devise a min-max entropy regularizer to selectively increase and decrease prediction confidence for different samples. Experiments on image classification and semantic segmentation verify the effectiveness of our methods.

翻译：测试时自适应（TTA）旨在通过针对任意测试样本调整给定模型，应对训练数据与测试数据之间潜在的分布偏移。尽管近期TTA展现了良好的性能，我们仍面临两个关键挑战：1）现有方法需对每个测试样本进行反向传播，导致许多应用中难以承受的优化成本；2）当前TTA虽能显著提升模型在分布外数据上的测试性能，但其在分布内数据上常出现严重的性能退化（即遗忘现象）。为此，我们提出了一种高效抗遗忘测试时自适应（EATA）方法，该方法设计了主动样本选择准则，用于识别可靠且非冗余的样本以进行测试时熵最小化。为缓解遗忘，EATA引入了基于测试样本估计的Fisher正则化项，约束重要模型参数发生剧烈变化。然而在EATA中，采用的熵损失即使对于本质不确定的样本也始终赋予其预测更高置信度，导致过度自信的预测。针对该问题，我们进一步提出带校准的EATA（EATA-C），通过分别利用可约简的模型不确定性与固有的数据不确定性实现校准的TTA。具体而言，我们利用完整网络与其子网络预测之间的差异度量模型不确定性，并据此提出差异损失以鼓励一致预测而非过度自信预测。为重新校准预测置信度，我们采用预测标签的不一致性作为数据不确定性的指标，并设计最小-最大熵正则化器，对不同样本选择性地增加或降低预测置信度。在图像分类与语义分割上的实验验证了所提方法的有效性。