Dataset distillation offers an efficient way to reduce memory and computational costs by optimizing a smaller dataset with performance comparable to the full-scale original. However, for large datasets and complex deep networks (e.g., ImageNet-1K with ResNet-101), the extensive optimization space limits performance, reducing its practicality. Recent approaches employ pre-trained diffusion models to generate informative images directly, avoiding pixel-level optimization and achieving notable results. However, these methods often face challenges due to distribution shifts between pre-trained models and target datasets, along with the need for multiple distillation steps across varying settings. To address these issues, we propose a novel framework orthogonal to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation. Our method starts by predicting noise generated by the diffusion model based on input images and text prompts (with or without label text), then calculates the corresponding loss for each pair. With the loss differences, we identify distinctive regions of the original images. Additionally, we perform intra-class clustering and ranking on selected patches to maintain diversity constraints. This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
翻译:数据集蒸馏通过优化一个规模更小但性能可与完整原始数据集相媲美的数据集,为降低内存与计算成本提供了一种高效途径。然而,对于大规模数据集和复杂深度网络(例如ImageNet-1K与ResNet-101组合),庞大的优化空间限制了其性能表现,从而削弱了其实用性。近期研究采用预训练扩散模型直接生成信息丰富的图像,避免了像素级优化并取得了显著成果。然而,这些方法常面临预训练模型与目标数据集之间的分布偏移挑战,同时需要在不同设置下进行多步蒸馏。为解决这些问题,我们提出了一种与现有基于扩散的蒸馏方法正交的新颖框架,利用扩散模型进行选择而非生成。我们的方法首先基于输入图像和文本提示(含或不含标签文本)预测扩散模型生成的噪声,随后计算每对样本对应的损失值。通过损失差异,我们识别出原始图像中的区分性区域。此外,我们对所选补丁进行类内聚类与排序以保持多样性约束。这一简化的框架实现了单步蒸馏过程,大量实验表明我们的方法在多项指标上超越了现有最先进技术。