Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks, such as MobileNet and EfficientNet. A common practice to address this problem is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. However, it is time and resource consuming to pretrain a teacher model when it is not available. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model. Specifically, we show that the optimal recipe for efficient models is different from that of larger models, and using the same training settings as ResNet50, as previous research does, is inappropriate. Additionally, we observe a common issu e in contrastive learning where either the positive or negative views can be noisy, and propose a smoothed version of InfoNCE loss to alleviate this problem. As a result, we successfully improve the linear evaluation results from 36.3\% to 62.3\% for MobileNet-V3-Large and from 42.2\% to 65.8\% for EfficientNet-B0 on ImageNet, closing the accuracy gap to ResNet50 with $5\times$ fewer parameters. We hope our research will facilitate the usage of lightweight contrastive models.
翻译:近期研究报告指出,在针对MobileNet和EfficientNet等专门设计的高效网络中,自监督对比学习存在性能下降的问题。为解决该问题,常见做法是引入预训练的对比教师模型,并利用教师生成的蒸馏信号训练轻量级网络。然而,当预训练教师模型不可用时,预训练过程既耗时又耗费资源。本研究旨在无需使用预训练教师模型的情况下,为轻量级对比模型建立更强的基准。具体而言,我们证明高效模型的最优训练方案与大型模型不同,此前研究沿用与ResNet50相同的训练设置并不恰当。此外,我们观察到对比学习中正负视图均可能存在噪声,并提出InfoNCE损失的平滑版本来缓解该问题。最终,我们在ImageNet上成功将MobileNet-V3-Large的线性评估结果从36.3%提升至62.3%,将EfficientNet-B0的结果从42.2%提升至65.8%,以5倍更少的参数缩小了与ResNet50的精度差距。我们期望本研究能促进轻量级对比模型的应用。