We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation from a set of diverse source models. Initially, we show how to use task similarity metrics to select a single suitable source model to distill from, and that a good selection process is imperative for good downstream performance of a target model. We dub this approach DistillNearest. Though effective, DistillNearest assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distillation method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DistillWeighted). Our methods need no access to source data, and merely need features and pseudo-labels of the source models. When the goal is accurate recognition under computational constraints, both DistillNearest and DistillWeighted approaches outperform both transfer learning from strong ImageNet initializations as well as state-of-the-art semi-supervised techniques such as FixMatch. Averaged over 8 diverse target tasks our multi-source method outperforms the baselines by 5.6%-points and 4.5%-points, respectively.
翻译:我们解决了在标签有限的情况下构建高效且准确的识别系统这一挑战。尽管识别模型会随模型规模和数据量提升而改进,但许多计算机视觉专业应用在训练和推理阶段都面临严重的资源限制。迁移学习是解决少量标签训练的有效方案,但通常以对大型基础模型进行计算昂贵的微调为代价。我们提出通过半监督跨领域蒸馏方法,利用一组多样化的源模型来缓解这种计算与精度之间的折衷困境。首先,我们展示了如何利用任务相似度指标选择单一适合的源模型进行蒸馏,并论证良好的选择过程对目标模型下游性能至关重要,该方法被命名为DistillNearest。尽管有效,但DistillNearest假设存在匹配目标任务的单一源模型,这并非始终成立。为此,我们提出一种加权多源蒸馏方法,将多个在不同领域训练的源模型(根据其与目标任务的相关性加权)蒸馏为单一高效模型,称为DistillWeighted。我们的方法无需访问源数据,仅需源模型的特征和伪标签。在计算约束条件下追求高精度识别时,DistillNearest和DistillWeighted两种方法均优于基于强ImageNet初始化的迁移学习以及FixMatch等最先进的半监督技术。在8个不同目标任务上的平均结果显示,我们的多源方法分别比基线高出5.6个百分点和4.5个百分点。