Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.
翻译:对深度学习分类和基础模型而言,对未见数据的泛化能力仍缺乏充分理解。如何评估网络适应新输入空间或其扩展版本的能力(如小样本学习、分布外泛化和领域适应)?网络中的哪些层可能具有最佳泛化性能?我们提出了一种新方法,用于评估网络表示采样领域的能力,无论网络是否经过该领域所有类别的训练。具体方法如下:在特定领域对最先进的视觉分类预训练模型进行微调后,我们评估其在相关但具有差异的变体数据上的表现。泛化能力通过无监督和监督设置下中间层未见数据的潜在嵌入函数进行量化。通过分析网络所有阶段,我们发现:(i)高分类精度并不等同于高泛化能力;(ii)模型中的深层并不总是泛化最佳,这对剪枝具有启示意义。由于跨数据集观察到的趋势基本一致,我们得出结论:我们的方法揭示了模型不同层内在(某种函数的)泛化能力。