Dataset distillation methods aim to compress a large dataset into a small set of synthetic samples, such that when being trained on, competitive performances can be achieved compared to regular training on the entire dataset. Among recently proposed methods, Matching Training Trajectories (MTT) achieves state-of-the-art performance on CIFAR-10/100, while having difficulty scaling to ImageNet-1k dataset due to the large memory requirement when performing unrolled gradient computation through back-propagation. Surprisingly, we show that there exists a procedure to exactly calculate the gradient of the trajectory matching loss with constant GPU memory requirement (irrelevant to the number of unrolled steps). With this finding, the proposed memory-efficient trajectory matching method can easily scale to ImageNet-1K with 6x memory reduction while introducing only around 2% runtime overhead than original MTT. Further, we find that assigning soft labels for synthetic images is crucial for the performance when scaling to larger number of categories (e.g., 1,000) and propose a novel soft label version of trajectory matching that facilities better aligning of model training trajectories on large datasets. The proposed algorithm not only surpasses previous SOTA on ImageNet-1K under extremely low IPCs (Images Per Class), but also for the first time enables us to scale up to 50 IPCs on ImageNet-1K. Our method (TESLA) achieves 27.9% testing accuracy, a remarkable +18.2% margin over prior arts.
翻译:数据集蒸馏方法旨在将大规模数据集压缩为少量合成样本,使得在这些样本上训练后,能够获得与完整数据集常规训练相媲美的性能。在近期提出的方法中,匹配训练轨迹(MTT)在CIFAR-10/100上达到了最先进水平,但由于通过反向传播进行展开梯度计算时所需的大内存,难以扩展至ImageNet-1K数据集。令人惊讶的是,我们发现存在一种方法,能够在恒定GPU内存需求(与展开步数无关)下精确计算轨迹匹配损失的梯度。基于这一发现,所提出的内存高效轨迹匹配方法可轻松扩展到ImageNet-1K,实现6倍内存缩减,同时仅引入约2%的运行时开销(相较于原始MTT)。此外,我们发现为合成图像分配软标签对于扩展到更大类别数量(如1000类)时的性能至关重要,并提出了一种新颖的软标签版本轨迹匹配方法,该方法能更好地在大型数据集上对齐模型训练轨迹。所提出的算法不仅在极低IPC(每类图像数)条件下超越了ImageNet-1K上的先前SOTA,而且首次使我们能够将IPC扩展到50。我们的方法(TESLA)达到了27.9%的测试准确率,较先前技术实现了+18.2%的显著提升。