Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains continuously without source data or target supervision. It has wide applications, such as robotic vision and autonomous driving. The main underlying challenge of CVDA is to learn helpful information only from the unsupervised target data while avoiding forgetting previously learned knowledge catastrophically, which is out of the capability of previous Video-based Unsupervised Domain Adaptation methods. Therefore, we propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA. Firstly, to learn from unsupervised domains, we propose to learn from pseudo labels. However, in continuous adaptation, prediction errors can accumulate rapidly in pseudo labels, and CART effectively tackles this problem with two key modules. Specifically, The first module generates refined pseudo labels using model predictions and deploys a novel attentive learning strategy. The second module compares the outputs of augmented data from the current model to the outputs of weakly augmented data from the source model, forming a novel consistency regularization on the model to alleviate the accumulation of prediction errors. Extensive experiments suggest that the CVDA performance of CART outperforms existing methods by a considerable margin.
翻译:连续性视频域自适应(CVDA)是一种场景,要求源模型在无源数据或目标监督的情况下,持续适应一系列逐个可用的变化目标域。该方法在机器人视觉和自动驾驶等领域具有广泛应用。CVDA的主要核心挑战在于仅从无监督目标数据中学习有效信息,同时避免灾难性遗忘先前学过的知识,这超出了现有基于视频的无监督域自适应方法的能力。为此,我们提出了一种结合置信度注意力网络与泛化增强自知识蒸馏的方法(CART)来应对CVDA中的挑战。首先,为从无监督域中学习,我们提出利用伪标签进行学习。然而在连续自适应过程中,伪标签中的预测误差会快速累积,而CART通过两个关键模块有效解决了这一问题。具体而言,第一个模块利用模型预测生成精炼伪标签,并采用了一种新颖的注意力学习策略;第二个模块将当前模型对增强数据的输出与源模型对弱增强数据的输出进行对比,构建了一种新颖的一致性正则化方法,从而缓解预测误差的累积。大量实验表明,CART在CVDA任务上的表现显著优于现有方法。