Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth.
翻译:深度网络对未见数据的泛化能力是核心需求,但其与分类准确性的关系尚不明确。通过一个极简视觉数据集及可泛化性度量,我们展示了从深度卷积网络(CNN)到Transformer的流行网络,其跨层与跨架构对未见类别的外推能力存在差异。准确性并非泛化能力的良好预测指标,且泛化性随层深呈非单调变化。