We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model's generalization and maintaining its performance on the original task. Models trained with self-supervised learning tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K. In this paper, we propose a supervised learning setup that leverages the best of both worlds. We extensively analyze supervised training using multi-scale crops for data augmentation and an expendable projector head, and reveal that the design of the projector allows us to control the trade-off between performance on the training task and transferability. We further replace the last layer of class weights with class prototypes computed on the fly using a memory bank and derive two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks. Code and pretrained models: https://europe.naverlabs.com/t-rex
翻译:我们考虑在给定分类任务(如ImageNet-1K, IN1K)上训练深度神经网络的问题,使得该网络既能在训练任务上表现优异,也能在其它(未来)迁移任务上表现出色。这两个看似矛盾的特性要求在提升模型泛化能力与保持其在原始任务上的性能之间进行权衡。通过自监督学习训练的模型在迁移学习中往往比监督模型具有更好的泛化能力,但其在IN1K上的表现仍落后于监督模型。本文提出一种监督学习框架,融合了两类方法的优势。我们深入分析了使用多尺度裁剪进行数据增强和可弃用投射头的监督训练,揭示了投射头的设计能够控制训练任务性能与可迁移性之间的权衡。我们进一步将最后一层类别权重替换为通过记忆库在线计算的类别原型,并衍生出两种模型:t-ReX在迁移学习上达到了新的最优水平,在IN1K上优于DINO和PAWS等顶尖方法;t-ReX*在IN1K上匹配高度优化的RSB-A1模型的同时,在迁移任务上表现更佳。代码与预训练模型:https://europe.naverlabs.com/t-rex