Despite the progress seen in classification methods, current approaches for handling videos with distribution shifts in source and target domains remain source-dependent as they require access to the source data during the adaptation stage. In this paper, we present a self-training based source-free video domain adaptation approach to address this challenge by bridging the gap between the source and the target domains. We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy. Thus, we treat the problem of source-free video domain adaptation as learning from noisy labels and argue that the samples with correct pseudo-labels can help us in adaptation. To this end, we leverage the cross-entropy loss as an indicator of the correctness of the pseudo-labels and use the resulting small-loss samples from the target domain for fine-tuning the model. We further enhance the adaptation performance by implementing a teacher-student framework, in which the teacher, which is updated gradually, produces reliable pseudo-labels. Meanwhile, the student undergoes fine-tuning on the target domain videos using these generated pseudo-labels to improve its performance. Extensive experimental evaluations show that our methods, termed as CleanAdapt, CleanAdapt + TS, achieve state-of-the-art results, outperforming the existing approaches on various open datasets. Our source code is publicly available at https://avijit9.github.io/CleanAdapt.
翻译:尽管分类方法取得了进展,当前处理源域与目标域分布偏移的视频方法仍依赖源域数据,因其在适应阶段需访问源数据。本文提出一种基于自训练的无源视频域适应方法,通过弥合源域与目标域之间的差距来应对这一挑战。我们利用源域预训练模型为目标域样本生成伪标签,但这些伪标签不可避免地存在噪声。因此,我们将无源视频域适应问题视为从噪声标签中学习,并认为拥有正确伪标签的样本有助于适应过程。为此,我们采用交叉熵损失作为伪标签正确性的指标,并利用目标域中损失较小的样本对模型进行微调。我们进一步通过实现教师-学生框架来增强适应性能,其中逐步更新的教师模型生成可靠的伪标签,同时学生模型使用这些生成的伪标签对目标域视频进行微调以提升性能。大量实验评估表明,我们的方法(称为CleanAdapt、CleanAdapt+TS)在多个公开数据集上取得了最先进的结果,超越了现有方法。我们的源代码已公开于https://avijit9.github.io/CleanAdapt。