Whilst contrastive learning yields powerful representations by matching different augmented views of the same instance, it lacks the ability to capture the similarities between different instances. One popular way to address this limitation is by learning global features (after the global pooling) to capture inter-instance relationships based on knowledge distillation, where the global features of the teacher are used to guide the learning of the global features of the student. Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other. This leads to our novel self-supervised framework: cross-context learning between global and hypercolumn features (CGH), that enforces the consistency of instance relations between low- and high-level semantics. Specifically, we stack the intermediate feature maps to construct a hypercolumn representation so that we can measure instance relations using two contexts (hypercolumn and global feature) separately, and then use the relations of one context to guide the learning of the other. This cross-context learning allows the model to learn from the differences between the two contexts. The experimental results on linear classification and downstream tasks show that our method outperforms the state-of-the-art methods.
翻译:对比学习通过匹配同一实例的不同增强视图获得强大的表示,但缺乏捕捉不同实例间相似性的能力。解决此局限性的主流方法之一是采用知识蒸馏学习全局特征(经全局池化后),通过教师模型的全局特征指导学生模型学习全局特征,从而捕捉实例间关系。受跨模态学习的启发,我们扩展了这种仅从全局特征学习的现有框架,通过促进全局特征与中间层特征的相互学习,提出了新颖的自我监督框架:全局与超列特征的交叉上下文学习(CGH)。该框架强制低层语义与高层语义之间的实例关系保持一致性。具体而言,我们堆叠中间层特征图构建超列表示,从而分别利用两种上下文(超列特征与全局特征)度量实例关系,并利用某一上下文的关系指导另一上下文的学习。这种跨上下文学习使模型能够从两种上下文的差异中习得知识。在线性分类及下游任务上的实验结果表明,我们的方法优于现有最先进方法。