The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive datasets into a much smaller yet representative set of synthetic samples. However, traditional dataset distillation approaches often struggle to scale effectively with high-resolution images and more complex architectures due to the limitations in bi-level optimization. Recently, several works have proposed exploiting knowledge distillation with decoupled optimization schemes to scale up dataset distillation. Although these methods effectively address the scalability issue, they rely on extensive image augmentations requiring the storage of soft labels for augmented images. In this paper, we introduce Dataset Distillation using Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models. Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets. By employing these learned text prompts, we can efficiently store and infer new samples for introducing data variability within a fixed memory budget. We show the effectiveness of our method through extensive experiments across various computer vision benchmark datasets with different memory budgets.
翻译:训练深度神经网络所需的海量数据在存储和传输方面带来了重大挑战。数据集蒸馏作为一种颇具前景的技术应运而生,旨在将庞大数据集的信息压缩成规模更小但具有代表性的合成样本集。然而,由于双层优化的局限性,传统数据集蒸馏方法在处理高分辨率图像和更复杂架构时往往难以有效扩展。近期,一些研究提出利用知识蒸馏结合解耦优化方案来扩展数据集蒸馏。尽管这些方法有效解决了可扩展性问题,但它们依赖大量图像增强操作,需要为增强图像存储软标签。本文提出基于扩散模型的数据集蒸馏(D3M),作为一种新型数据集蒸馏范式,借鉴了生成式文本到图像基础模型的最新进展。我们的方法利用文本反转(一种对文本到图像生成模型进行微调的技术),为大型数据集创建简洁且信息丰富的表征。通过使用这些学习到的文本提示,我们能够在固定内存预算内高效存储并推断新样本以引入数据变异性。我们通过在多种计算机视觉基准数据集上使用不同内存预算进行广泛实验,证明了该方法的有效性。