Dataset distillation aims to synthesize small datasets with little information loss from original large-scale ones for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample synthesis process by matching synthetic images and the original ones regarding gradients, embedding distributions, or training trajectories. Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling. We argue that random sampling overlooks the evenness of the selected sample distribution, which may result in noisy or biased matching targets. Besides, the sample diversity is also not constrained by random sampling. These factors together lead to optimization instability in the distilling process and degrade the training efficiency. Accordingly, we propose a novel matching strategy named as \textbf{D}ataset distillation by \textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only representative original images are selected for matching. DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the distilling iterations by more than 8 times without performance drop. Given sufficient training time, DREAM further provides significant improvements and achieves state-of-the-art performances.
翻译:数据集蒸馏旨在合成损失较少信息的小型数据集,从而降低原始大规模数据集的存储和训练成本。当前最先进的方法主要通过匹配合成图像与原始图像的梯度、嵌入分布或训练轨迹来约束样本合成过程。尽管存在多种匹配目标,但目前选择原始图像的策略仍局限于朴素随机采样。我们认为,随机采样忽视了所选样本分布的均匀性,可能导致含噪声或有偏的匹配目标。此外,样本多样性也未通过随机采样得到约束。这些因素共同导致蒸馏过程中的优化不稳定性,并降低了训练效率。为此,我们提出一种新的匹配策略——基于代表性匹配的数据集蒸馏(DREAM),仅选择有代表性的原始图像进行匹配。DREAM可轻松嵌入主流数据集蒸馏框架,在性能不下降的情况下将蒸馏迭代次数减少8倍以上。在训练时间充足时,DREAM还能进一步提升性能,达到最先进水平。