Self-supervised methods based on contrastive learning have achieved great success in unsupervised visual representation learning. However, most methods under this framework suffer from the problem of false negative samples. Inspired by the mean shift for self-supervised learning, we propose a new simple framework, namely Multiple Sample Views and Queues (MSVQ). We jointly construct three soft labels on-the-fly by utilizing two complementary and symmetric approaches: multiple augmented positive views and two momentum encoders that generate various semantic features for negative samples. Two teacher networks perform similarity relationship calculations with negative samples and then transfer this knowledge to the student network. Let the student network mimic the similarity relationships between the samples, thus giving the student network a more flexible ability to identify false negative samples in the dataset. The classification results on four benchmark image datasets demonstrate the high effectiveness and efficiency of our approach compared to some classical methods. Source code and pretrained models are available \href{https://github.com/pc-cp/MSVQ}{here}.
翻译:基于对比学习的自监督方法在无监督视觉表示学习中取得了巨大成功。然而,该框架下的多数方法存在假负样本问题。受均值漂移思想对自监督学习的启发,我们提出一种新颖简洁的框架——多视角样本与队列(MSVQ)。通过两种互补对称途径:多增强正视角与两个生成负样本多样语义特征的动量编码器,我们联合在线构建三种软标签。两个教师网络与负样本进行相似度关系计算,并将该知识迁移至学生网络。使学生网络模仿样本间的相似度关系,从而赋予其更灵活地识别数据集中假负样本的能力。在四个基准图像数据集上的分类结果表明,与经典方法相比,本方法具有高效性与有效性。源代码与预训练模型可从\href{https://github.com/pc-cp/MSVQ}{此处}获取。