Test Time Adaptation (TTA) addresses the problem of distribution shift by enabling pretrained models to learn new features on an unseen domain at test time. However, it poses a significant challenge to maintain a balance between learning new features and retaining useful pretrained features. In this paper, we propose Layerwise EArly STopping (LEAST) for TTA to address this problem. The key idea is to stop adapting individual layers during TTA if the features being learned do not appear beneficial for the new domain. For that purpose, we propose using a novel gradient-based metric to measure the relevance of the current learnt features to the new domain without the need for supervised labels. More specifically, we propose to use this metric to determine dynamically when to stop updating each layer during TTA. This enables a more balanced adaptation, restricted to layers benefiting from it, and only for a certain number of steps. Such an approach also has the added effect of limiting the forgetting of pretrained features useful for dealing with new domains. Through extensive experiments, we demonstrate that Layerwise Early Stopping improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.
翻译:测试时适应(TTA)通过使预训练模型在测试阶段从未知域中学习新特征,解决了分布偏移问题。然而,如何在习得新特征的同时保留有用的预训练特征成为一项重大挑战。本文提出用于TTA的分层早停机制(LEAST)以解决该问题。其核心思想是:在TTA过程中,若某层所学习的特征对新域无益,则停止对该层进行自适应更新。为此,我们提出一种基于梯度的新型度量指标,无需监督标签即可评估当前学习特征与新域的相关性。具体而言,通过该度量动态决定TTA过程中每层应何时停止更新,从而实现更平衡的自适应——将更新限制在受益的层上,且仅持续若干步数。该方法还能有效减少对处理新域有用的预训练特征的遗忘。大量实验表明,分层早停机制在多种数据集、域偏移场景、模型架构及TTA损失函数下,均能提升现有TTA方法的性能。