In this paper, we propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task, therefore improving the performance of dataset distillation. Deep neural networks achieve remarkable performance but have time and storage-consuming training processes. Dataset distillation is proposed to generate compact, high-quality distilled datasets, enabling effective model training while maintaining downstream performance. Existing approaches typically focus on features extracted from the original dataset, overlooking task-specific information, which leads to a target gap between the distillation objective and the downstream task. We propose leveraging characteristics that benefit the downstream training into data distillation to bridge this gap. Focusing on the downstream task of image classification, we introduce the concept of difficulty and propose DGS as a plug-in post-stage sampling module. Following the specific target difficulty distribution, the final distilled dataset is sampled from image pools generated by existing methods. We also propose difficulty-aware guidance (DAG) to explore the effect of difficulty in the generation process. Extensive experiments across multiple settings demonstrate the effectiveness of the proposed methods. It also highlights the broader potential of difficulty for diverse downstream tasks.
翻译:本文提出难度引导采样(DGS)方法,旨在弥合蒸馏目标与下游任务之间的目标差距,从而提升数据集蒸馏的性能。深度神经网络虽能取得卓越性能,但其训练过程耗时且占用大量存储空间。数据集蒸馏技术应运而生,旨在生成紧凑、高质量的蒸馏数据集,在保持下游性能的同时实现高效的模型训练。现有方法通常侧重于从原始数据集中提取特征,却忽略了任务特定信息,这导致蒸馏目标与下游任务之间存在目标差距。我们提出将有利于下游训练的特征属性融入数据蒸馏过程,以弥合这一差距。聚焦于图像分类这一下游任务,我们引入了“难度”概念,并提出DGS作为一种可即插即用的后处理采样模块。依据特定的目标难度分布,最终蒸馏数据集从现有方法生成的图像池中采样得到。我们还提出了难度感知引导(DAG)方法,以探究难度在生成过程中的影响。在多种设置下的大量实验证明了所提方法的有效性。该研究也凸显了难度概念在多样化下游任务中更广泛的应用潜力。