Self-supervised Dataset Distillation: A Good Compression Is All You Need

Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this work, we consider addressing this task through the new lens of model informativeness in the compression stage on the original dataset pretraining. We observe that with the prior state-of-the-art SRe$^2$L, as model sizes increase, it becomes increasingly challenging for supervised pretrained models to recover learned information during data synthesis, as the channel-wise mean and variance inside the model are flatting and less informative. We further notice that larger variances in BN statistics from self-supervised models enable larger loss signals to update the recovered data by gradients, enjoying more informativeness during synthesis. Building on this observation, we introduce SC-DD, a simple yet effective Self-supervised Compression framework for Dataset Distillation that facilitates diverse information compression and recovery compared to traditional supervised learning schemes, further reaps the potential of large pretrained models with enhanced capabilities. Extensive experiments are conducted on CIFAR-100, Tiny-ImageNet and ImageNet-1K datasets to demonstrate the superiority of our proposed approach. The proposed SC-DD outperforms all previous state-of-the-art supervised dataset distillation methods when employing larger models, such as SRe$^2$L, MTT, TESLA, DC, CAFE, etc., by large margins under the same recovery and post-training budgets. Code is available at https://github.com/VILA-Lab/SRe2L/tree/main/SCDD/.

翻译：数据集蒸馏旨在将大规模原始数据集中的信息压缩至一个新的紧凑数据集中，同时力求最大限度地保留原始数据的信息本质。以往的研究主要集中在对齐原始数据与蒸馏数据之间的中间统计量，如权重轨迹、特征、梯度、批归一化等。在本工作中，我们考虑通过压缩阶段原始数据集预训练中模型信息量的新视角来应对这一任务。我们观察到，在现有最先进的SRe$^2$L方法中，随着模型规模的增大，有监督预训练模型在数据合成过程中恢复所学信息变得越来越困难，因为模型内部的通道均值和方差趋于平坦且信息量减少。我们进一步注意到，自监督模型中批归一化统计量的较大方差能够产生更大的损失信号，通过梯度更新恢复的数据，从而在合成过程中获得更多的信息量。基于这一观察，我们引入了SC-DD，一种简单而有效的自监督压缩框架用于数据集蒸馏，与传统有监督学习方案相比，它促进了多样化信息的压缩和恢复，进一步挖掘了具有增强能力的大规模预训练模型的潜力。在CIFAR-100、Tiny-ImageNet和ImageNet-1K数据集上进行了大量实验，以证明我们提出的方法的优越性。所提出的SC-DD在使用更大规模模型（如SRe$^2$L、MTT、TESLA、DC、CAFE等）时，在相同的恢复和后训练预算下，以较大幅度超越了所有先前最先进的有监督数据集蒸馏方法。代码可在https://github.com/VILA-Lab/SRe2L/tree/main/SCDD/获取。