Since distribution shifts are likely to occur during test-time and can drastically decrease the model's performance, online test-time adaptation (TTA) continues to update the model after deployment, leveraging the current test data. Clearly, a method proposed for online TTA has to perform well for all kinds of environmental conditions. By introducing the variable factors 'domain non-stationarity' and 'temporal correlation', we first unfold all practically relevant settings and define the entity as universal TTA. To tackle the problem of universal TTA, we identify and highlight several challenges a self-training based method has to deal with, including: 1) model bias and the occurrence of trivial solutions when performing entropy minimization on varying sequence lengths with and without multiple domain shifts, 2) loss of generalization which exacerbates the adaptation to future domain shifts and the occurrence of catastrophic forgetting, and 3) performance degradation due to shifts in label prior. To prevent the model from becoming biased, we leverage a dataset and model-agnostic certainty and diversity weighting. In order to maintain generalization and prevent catastrophic forgetting, we propose to continually weight-average the source and adapted model. To compensate for disparities in the label prior during test-time, we propose an adaptive additive prior correction scheme. We evaluate our approach, named ROID, on a wide range of settings, datasets, and models, setting new standards in the field of universal TTA.
翻译:由于分布偏移很可能在测试时发生,并会大幅降低模型性能,在线测试时自适应(TTA)在模型部署后持续利用当前测试数据进行更新。显然,针对在线TTA提出的方法必须能在各种环境条件下表现良好。通过引入"域非平稳性"和"时间相关性"这两个可变因素,我们首先揭示了所有实际相关设置,并将该实体定义为通用TTA。为解决通用TTA问题,我们识别并强调了基于自训练的方法需要应对的若干挑战,包括:1)在存在或不存在的多重域偏移下对可变序列长度进行熵最小化时,模型偏差和琐碎解的出现;2)泛化能力的丧失加剧了对未来域偏移的适应困难以及灾难性遗忘的发生;3)标签先验偏移导致的性能退化。为防止模型产生偏差,我们利用了与数据集和模型无关的确定性与多样性加权。为保持泛化能力并防止灾难性遗忘,我们提出持续对源模型和自适应模型进行权重平均。为补偿测试期间标签先验的差异,我们提出一种自适应加性先验校正方案。我们将所提出的方法命名为ROID,并在广泛的设置、数据集和模型上进行了评估,为通用TTA领域树立了新的标准。