Dataset distillation aims to synthesize small datasets with little information loss from original large-scale ones for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample synthesis process by matching synthetic images and the original ones regarding gradients, embedding distributions, or training trajectories. Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling. We argue that random sampling overlooks the evenness of the selected sample distribution, which may result in noisy or biased matching targets. Besides, the sample diversity is also not constrained by random sampling. These factors together lead to optimization instability in the distilling process and degrade the training efficiency. Accordingly, we propose a novel matching strategy named as \textbf{D}ataset distillation by \textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only representative original images are selected for matching. DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the distilling iterations by more than 8 times without performance drop. Given sufficient training time, DREAM further provides significant improvements and achieves state-of-the-art performances.
翻译:数据集蒸馏旨在从原始大规模数据集中合成少量信息损失小的数据集,以减少存储和训练成本。当前主流方法主要通过匹配合成图像与原始图像在梯度、嵌入分布或训练轨迹上的关系来约束样本合成过程。尽管存在多种匹配目标,但目前选取原始图像的策略仍局限于朴素的随机采样。我们指出,随机采样忽视了所选样本分布的均匀性,可能导致噪声或偏差的匹配目标。此外,样本多样性也未通过随机采样得到约束。这些因素共同导致蒸馏过程中的优化不稳定并降低训练效率。为此,我们提出了一种新的匹配策略,即基于代表性匹配的数据集蒸馏(DREAM),其仅选择具有代表性的原始图像进行匹配。DREAM可轻松嵌入主流数据集蒸馏框架,并在性能不下降的情况下将蒸馏迭代次数减少8倍以上。当训练时间充裕时,DREAM进一步带来显著提升,实现最先进的性能。