Unsupervised cross-domain action recognition aims at adapting the model trained on an existing labeled source domain to a new unlabeled target domain. Most existing methods solve the task by directly aligning the feature distributions of source and target domains. However, this would cause negative transfer during domain adaptation due to some negative training samples in both domains. In the source domain, some training samples are of low-relevance to target domain due to the difference in viewpoints, action styles, etc. In the target domain, there are some ambiguous training samples that can be easily classified as another type of action under the case of source domain. The problem of negative transfer has been explored in cross-domain object detection, while it remains under-explored in cross-domain action recognition. Therefore, we propose a Multi-modal Instance Refinement (MMIR) method to alleviate the negative transfer based on reinforcement learning. Specifically, a reinforcement learning agent is trained in both domains for every modality to refine the training data by selecting out negative samples from each domain. Our method finally outperforms several other state-of-the-art baselines in cross-domain action recognition on the benchmark EPIC-Kitchens dataset, which demonstrates the advantage of MMIR in reducing negative transfer.
翻译:无监督跨域动作识别旨在将现有带标签源域上训练的模型适应到新的无标签目标域。现有方法大多通过直接对齐源域和目标域的特征分布来解决该任务。然而,由于两个域中存在某些负训练样本,这种方法会在域适应过程中导致负迁移。在源域中,由于视角、动作风格等差异,部分训练样本与目标域的相关性较低。在目标域中,存在一些模糊的训练样本,在源域条件下容易被错误分类为另一种动作类型。负迁移问题已在跨域目标检测中得到探索,但在跨域动作识别中仍未充分研究。因此,我们提出一种基于强化学习的多模态实例精炼方法,以缓解负迁移。具体而言,在每个模态的两个域中训练一个强化学习代理,通过从每个域中筛选出负样本,对训练数据进行精炼。在基准数据集EPIC-Kitchens的跨域动作识别任务中,我们的方法最终优于其他多个最新基线方法,这证明了MMIR在减少负迁移方面的优势。