We introduce ImageNot, a dataset constructed explicitly to be drastically different than ImageNet while matching its scale. ImageNot is designed to test the external validity of deep learning progress on ImageNet. We show that key model architectures developed for ImageNet over the years rank identically to how they rank on ImageNet when trained from scratch and evaluated on ImageNot. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models when trained and evaluated on an entirely different dataset. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.
翻译:我们提出了ImageNot数据集,该数据集在规模上与ImageNet匹配,但构建时明确旨在与ImageNet存在显著差异。ImageNot旨在检验深度学习在ImageNet上取得进展的外部有效性。我们发现,多年来为ImageNet开发的关键模型架构,在从零开始训练并在ImageNot上评估时,其排名与在ImageNet上的排名完全相同。此外,每个模型相对于早期模型的改进程度在两个数据集上表现出强相关性。我们的研究表明,当在完全不同的数据集上进行训练和评估时,图像分类模型的相对性能具有惊人的外部有效性。这与绝对准确率形成对比,后者通常在数据集发生微小变化时急剧下降。