A domain shift occurs when training (source) and test (target) data diverge in their distribution. Test-time adaptation (TTA) addresses the domain shift problem, aiming to adopt a trained model on the source domain to the target domain in a scenario where only a well-trained source model and unlabeled target data are available. In this scenario, handling false labels in the target domain is crucial because they negatively impact the model performance. To deal with this problem, we propose to utilize cluster structure (i.e., {`Clean'} and {`Noisy'} regions within each cluster) in the target domain formulated by the source model. Given an initial clustering of target samples, we first partition clusters into {`Clean'} and {`Noisy'} regions defined based on cluster prototype (i.e., centroid of each cluster). As these regions have totally different distributions of the true pseudo-labels, we adopt distinct training strategies for the clean and noisy regions: we selectively train the target with clean pseudo-labels in the clean region, whereas we introduce mixup inputs representing intermediate features between clean and noisy regions to increase the compactness of the cluster. We conducted extensive experiments on multiple datasets in online/offline TTA settings, whose results demonstrate that our method, {CNA-TTA}, achieves state-of-the-art for most cases.
翻译:当训练数据(源域)与测试数据(目标域)的分布存在差异时,即发生域偏移。测试时自适应(TTA)旨在解决域偏移问题,其场景为仅能获取预训练的源域模型与无标签的目标域数据,通过将源域模型适配至目标域。在此场景中,处理目标域中的错误伪标签至关重要,因其会对模型性能产生负面影响。针对该问题,我们提出利用源域模型在目标域中形成的簇结构(即每个簇内的“干净”与“噪声”区域)。基于对目标样本的初始聚类结果,我们首先依据簇原型(即每个簇的质心)将簇划分为“干净”与“噪声”区域。由于这两类区域中真实伪标签的分布截然不同,我们对其采用差异化的训练策略:在干净区域中,选择性地使用干净伪标签对目标进行训练;而在噪声区域中,则引入混合输入以表征干净与噪声区域间的中间特征,从而增强簇的紧致性。我们在多个数据集上进行了在线/离线TTA设置下的广泛实验,结果表明所提出的CNA-TTA方法在大多数情况下均达到了当前最优性能。