Researchers have long tried to minimize training costs in deep learning while maintaining strong generalization across diverse datasets. Emerging research on dataset distillation aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset and ultimately achieves test accuracy equivalent to a model trained on the whole dataset. Unfortunately, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data, and they incur significant computational costs. Despite promising results, there still exists a significant performance gap between models trained on condensed synthetic sets and those trained on the whole dataset. In this paper, we address these challenges using efficient Dataset Distillation with Attention Matching (DataDAM), achieving state-of-the-art performance while reducing training costs. Specifically, we learn synthetic images by matching the spatial attention maps of real and synthetic data generated by different layers within a family of randomly initialized neural networks. Our method outperforms the prior methods on several datasets, including CIFAR10/100, TinyImageNet, ImageNet-1K, and subsets of ImageNet-1K across most of the settings, and achieves improvements of up to 6.5% and 4.1% on CIFAR100 and ImageNet-1K, respectively. We also show that our high-quality distilled images have practical benefits for downstream applications, such as continual learning and neural architecture search.
翻译:研究者长期以来致力于在深度学习中降低训练成本,同时保持模型在不同数据集上的强泛化能力。新兴的数据集蒸馏研究旨在通过创建包含更大真实数据集信息的小型合成集来降低训练成本,最终使模型达到与在全数据集上训练相当的测试精度。然而,现有方法生成的合成数据无法保证拥有与原始训练数据相当的分布特性和判别能力,且会带来显著的计算开销。尽管取得了一定成果,但在压缩合成集训练的模型与全数据集训练模型之间仍存在明显性能差距。本文通过提出基于注意力匹配的高效数据集蒸馏方法(DataDAM),在降低训练成本的同时实现了最先进的性能。具体而言,我们通过匹配随机初始化神经网络家族中不同层对真实与合成数据生成的空间注意力图来学习合成图像。该方法在CIFAR10/100、TinyImageNet、ImageNet-1K及其子集等多个数据集上,于多数设定下均优于现有方法,在CIFAR100和ImageNet-1K上分别实现了最高6.5%和4.1%的性能提升。我们还证明了高质量蒸馏图像对持续学习、神经架构搜索等下游应用具有实际价值。