Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size.

翻译：核心集选择是减少CNN训练时间最有效的方法之一，然而，关于由此产生的模型在核心集大小、数据集和模型选择变化下的行为所知甚少。此外，鉴于近期向基于Transformer模型的范式转变，核心集选择将如何影响其性能仍是一个开放性问题。为实现核心集选择方法的广泛接受，尚需解答若干类似引人深思的问题，本文试图回答其中一部分。我们建立了一个系统性的基准测试框架，并对CNN和Transformer上不同核心集选择方法进行了严格比较。研究表明，在某些情况下，与当前最优的选择方法相比，随机子集选择更具鲁棒性和稳定性。我们证明，传统的跨数据类别均匀子集采样概念并非恰当选择，而应根据每类数据分布的复杂性自适应选择样本。Transformer通常在大规模数据集上预训练，我们表明对于某些目标数据集，即使在非常小的核心集规模下，预训练也有助于保持其性能稳定。我们进一步证明，当未进行预训练或预训练Transformer模型用于非自然图像（如医学数据）时，即使在非常小的核心集规模下，CNN的泛化能力通常优于Transformer。最后，我们证明在没有适当预训练的情况下，CNN更擅长学习图像中空间距离较远对象之间的语义连贯性，且几乎在所有核心集大小选择下均优于Transformer。