Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.
翻译:面向未见数据的泛化能力在深度学习分类与基础模型中仍不明确。如何评估网络适应新输入空间或扩展输入空间的能力(如少样本学习、分布外泛化与域自适应)?网络的哪些层可能具有最佳泛化性能?我们提出了一种新方法,用于评估网络表征采样域的能力,无论该网络是否已在域内所有类别上训练。方法如下:在特定域上微调面向视觉分类的最先进预训练模型后,评估其对域内相关但不同变体数据的性能。泛化能力通过无监督与有监督设置下中间层未见数据的潜在嵌入进行量化。通过对网络所有层的研究,我们发现:(i)高分类准确率并不意味着高泛化能力;(ii)模型中的深层并非总是泛化最佳,这对网络剪枝具有启示意义。由于跨数据集观察到的趋势高度一致,我们得出结论:该方法揭示了模型不同层在泛化方面的固有容量(函数)。