Whilst contrastive learning yields powerful representations by matching different augmented views of the same instance, it lacks the ability to capture the similarities between different instances. One popular way to address this limitation is by learning global features (after the global pooling) to capture inter-instance relationships based on knowledge distillation, where the global features of the teacher are used to guide the learning of the global features of the student. Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other. This leads to our novel self-supervised framework: cross-context learning between global and hypercolumn features (CGH), that enforces the consistency of instance relations between low- and high-level semantics. Specifically, we stack the intermediate feature maps to construct a hypercolumn representation so that we can measure instance relations using two contexts (hypercolumn and global feature) separately, and then use the relations of one context to guide the learning of the other. This cross-context learning allows the model to learn from the differences between the two contexts. The experimental results on linear classification and downstream tasks show that our method outperforms the state-of-the-art methods.
翻译:尽管对比学习通过匹配同一实例的不同增强视图获得了强大的表示能力,但它无法捕捉不同实例之间的相似性。解决这一局限性的常用方法是学习全局特征(全局池化后),通过知识蒸馏的方法捕捉实例间关系,即用教师模型的全局特征指导学生模型全局特征的学习。受跨模态学习的启发,我们扩展了这种仅从全局特征学习的现有框架,通过鼓励全局特征与中间层特征相互学习。这引出了我们新颖的自监督框架:全局与超列特征之间的跨上下文学习(CGH),该框架强制实现了低层与高层语义之间实例关系的一致性。具体而言,我们堆叠中间特征图构建超列表示,从而能够分别利用两种上下文(超列和全局特征)度量实例关系,然后用一种上下文的关系指导另一种上下文的学习。这种跨上下文学习使模型能够从两种上下文的差异中学习。在线性分类和下游任务上的实验结果表明,我们的方法优于现有最先进方法。