Computationally expensive training strategies make self-supervised learning (SSL) impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweightmodel, which usually involves multiple epochs of fine-tuning (or distilling steps) of a large pre-trained model, making it more computationally challenging. In this work we present a novel perspective on the interplay between SSL and DC paradigms. In particular, we show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting without any additional fine-tuning or pruning steps. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off and therefore yields a generic and multi-purpose architecture for application specific industrial settings. Extensive experiments on several image classification benchmarks including CIFAR-10/100, STL-10 and ImageNet-100, demonstrate that the proposed training strategy provides a dense and corresponding gated sub-network that achieves on-par performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs, under a range of target budgets (td ).
翻译:计算密集型训练策略使得自监督学习(SSL)在资源受限的工业场景中难以实际应用。知识蒸馏(KD)、动态计算(DC)和剪枝等技术常被用于获取轻量化模型,这通常需要对大型预训练模型进行多轮微调(或蒸馏步骤),进一步加剧了计算挑战。本文提出了自监督学习与动态计算范式之间相互作用的新视角。具体而言,我们证明了在自监督学习框架下,无需额外微调或剪枝步骤,即可从零开始同时学习稠密子网络和门控子网络。预训练过程中稠密编码器与门控编码器的协同演化,在精度与效率之间实现了良好的权衡,进而生成适用于特定工业场景的通用多用途架构。在包括CIFAR-10/100、STL-10和ImageNet-100在内的多个图像分类基准上的大量实验表明,所提出的训练策略在多种目标预算(td)下,能够获得与标准自监督设置性能相当的稠密子网络和对应门控子网络,同时以FLOPs衡量的计算量显著降低。