We present a new method of self-supervised learning and knowledge distillation based on the multi-views and multi-representations (MV-MR). The MV-MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from augmented view and multiple non-learnable representations from non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV-MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. Along this line, knowledge distillation is considered a particular case of such a regularization. MV-MR provides the state-of-the-art performance on the STL10 and ImageNet-1K datasets among non-contrastive and clustering-free methods. We show that a lower complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 linear evaluation. The code is available at: https://github.com/vkinakh/mv-mr
翻译:我们提出了一种基于多视图与多表示(MV-MR)的自监督学习与知识蒸馏新方法。MV-MR的核心在于最大化增强视图与非增强视图之间可学习嵌入的依赖性,同时最大化增强视图的可学习嵌入与非增强视图的多个不可学习表示之间的依赖性。我们证明,所提方法可用于高效的自监督分类与模型无关的知识蒸馏。与其他自监督技术不同,我们的方法不使用任何对比学习、聚类或梯度截断。MV-MR是一个通用框架,允许通过使用图像多表示作为正则化器来对可学习嵌入施加约束。在此框架下,知识蒸馏被视为此类正则化的一个特例。MV-MR在STL10和ImageNet-1K数据集上取得了非对比且无需聚类方法中的最先进性能。实验表明,基于CLIP ViT模型并采用所提知识蒸馏方法预训练的较低复杂度ResNet50模型,在STL10线性评估任务中达到了最优性能。代码已开源:https://github.com/vkinakh/mv-mr