Dataset distillation reduces the network training cost by synthesizing small and informative datasets from large-scale ones. Despite the success of the recent dataset distillation algorithms, three drawbacks still limit their wider application: i). the synthetic images perform poorly on large architectures; ii). they need to be re-optimized when the distillation ratio changes; iii). the limited diversity restricts the performance when the distillation ratio is large. In this paper, we propose a novel distillation scheme to \textbf{D}istill information of large train sets \textbf{i}nto generative \textbf{M}odels, named DiM. Specifically, DiM learns to use a generative model to store the information of the target dataset. During the distillation phase, we minimize the differences in logits predicted by a models pool between real and generated images. At the deployment stage, the generative model synthesizes various training samples from random noises on the fly. Due to the simple yet effective designs, the trained DiM can be directly applied to different distillation ratios and large architectures without extra cost. We validate the proposed DiM across 4 datasets and achieve state-of-the-art results on all of them. To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75.1\% with ResNet-18 and 72.6\% with ConvNet-3 on ten images per class of CIFAR-10. Besides, DiM outperforms previous methods with 10\% $\sim$ 22\% when images per class are 1 and 10 on the SVHN dataset.
翻译:数据集蒸馏通过从大规模数据集中合成小而信息丰富的数据集来降低网络训练成本。尽管近期数据集蒸馏算法取得了成功,但三个缺点仍限制了其更广泛应用:i) 合成图像在大规模架构上表现不佳;ii) 当蒸馏比率改变时需要重新优化;iii) 有限的多样性在较大蒸馏比率下限制了性能。本文提出一种新颖的蒸馏方案,将大规模训练集的信息蒸馏到生成模型中,命名为DiM。具体而言,DiM学习使用生成模型存储目标数据集的信息。在蒸馏阶段,我们最小化模型池对真实图像与生成图像预测的logits差异。在部署阶段,生成模型从随机噪声中即时合成多样化的训练样本。由于设计简洁而有效,训练后的DiM可直接应用于不同蒸馏比率和大规模架构,无需额外成本。我们在4个数据集上验证了所提出的DiM,并在所有数据集上取得了最先进的结果。据我们所知,我们是首个在复杂架构上取得比简单架构更高准确率的工作,例如在CIFAR-10每类十张图像的情况下,ResNet-18达到75.1%,ConvNet-3达到72.6%。此外,在SVHN数据集上当每类图像数为1和10时,DiM以10%~22%的优势超越先前方法。