Self-supervised learning (SSL) has developed rapidly in recent years. However, most of the mainstream methods are computationally expensive and rely on two (or more) augmentations for each image to construct positive pairs. Moreover, they mainly focus on large models and large-scale datasets, which lack flexibility and feasibility in many practical applications. In this paper, we propose an efficient single-branch SSL method based on non-parametric instance discrimination, aiming to improve the algorithm, model, and data efficiency of SSL. By analyzing the gradient formula, we correct the update rule of the memory bank with improved performance. We further propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version. We show that this alleviates the infrequent updating problem in instance discrimination and greatly accelerates convergence. We systematically compare the training overhead and performance of different methods in different scales of data, and under different backbones. Experimental results show that our method outperforms various baselines with significantly less overhead, and is especially effective for limited amounts of data and small models.
翻译:自监督学习(SSL)近年来发展迅速。然而,大多数主流方法计算成本高昂,且依赖每张图像的两(或更多)种增强来构建正样本对。此外,这些方法主要关注大规模模型与大规模数据集,在许多实际应用中缺乏灵活性与可行性。本文提出一种基于非参数实例判别的高效单分支自监督学习方法,旨在提升SSL的算法、模型与数据效率。通过分析梯度公式,我们修正了记忆库的更新规则,从而提升性能。我们进一步提出一种新型自蒸馏损失,该损失最小化概率分布与其平方根版本之间的KL散度。我们证明,这种损失缓解了实例判别中更新频率不足的问题,并极大加速了收敛过程。我们系统比较了不同方法在不同数据规模与不同骨干网络下的训练开销与性能。实验结果表明,我们的方法以显著更低的开销优于多种基线方法,尤其在数据量有限和小模型场景下效果突出。