The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://github.com/paulgavrikov/biases_vs_generalization
翻译:模型对于从训练分布长尾中采样的罕见分布内样本以及分布外样本的稳健泛化能力,是当前深度学习方法面临的主要挑战之一。在图像分类任务中,这表现为对抗性攻击的存在、失真图像上的性能下降,以及对素描等概念缺乏泛化能力。目前对神经网络泛化机制的理解十分有限,但已识别出一些将模型与人眼视觉区分开的偏差,这些偏差可能导致了上述局限性。因此,研究人员尝试在训练过程中减少这些偏差以提升泛化能力,但成功程度各异。我们退一步对这些尝试进行合理性检验。在固定架构为成熟的ResNet-50的前提下,我们对通过不同训练方法获得的48个ImageNet模型进行了大规模研究,以探究这些偏差(包括形状偏差、频谱偏差和关键频带)如何以及是否与泛化存在相互作用。我们广泛的研究结果表明:与以往发现相反,仅凭这些偏差无法准确预测模型的整体泛化能力。我们在https://github.com/paulgavrikov/biases_vs_generalization 提供所有检查点和评估代码的访问权限。