Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet as a basis for deriving such architectures. We conduct an extensive empirical study for which we train $500$ CNN architectures, sampled from the broad AnyNetX design space, on ImageNet as well as $8$ additional well known image classification benchmark datasets from a diverse array of application domains. We observe that the performances of the architectures are highly dataset dependent. Some datasets even exhibit a negative error correlation with ImageNet across all architectures. We show how to significantly increase these correlations by utilizing ImageNet subsets restricted to fewer classes. These contributions can have a profound impact on the way we design future CNN architectures and help alleviate the tilt we see currently in our community with respect to over-reliance on one dataset.
翻译:基于ImageNet的分类性能是CNN开发的行业标准度量。本研究质疑仅基于ImageNet进行CNN架构设计是否能产生适用于多样数据集和应用领域的通用卷积神经网络架构。为此,我们探究并最终优化ImageNet作为此类架构设计基准的效用。我们开展大规模实证研究,从广泛的AnyNetX设计空间中采样500种CNN架构,在ImageNet及来自不同应用领域的8个额外知名图像分类基准数据集上进行训练。观察发现,架构性能高度依赖于数据集。某些数据集甚至与ImageNet在所有架构上呈现负误差相关性。我们展示了如何通过使用限制类别更少的ImageNet子集显著提升这些相关性。这些发现将对未来CNN架构设计方式产生深远影响,并有助于纠正学界当前过度依赖单一数据集的倾向。