We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for effective dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution training, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also outperforms MTT by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster in speed with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://zeyuanyin.github.io/projects/SRe2L/.
翻译:我们提出了一种名为“挤压、恢复与重标注”(SRe$^2$L)的新型数据库压缩框架,该框架将训练过程中模型与合成数据的双层优化解耦,以处理不同规模的数据库、模型架构和图像分辨率,从而实现高效的数据库压缩。所提出的方法在不同数据库规模下展现出灵活性,并在多个方面具有优势,包括合成图像的任意分辨率、低训练成本以及高分辨率训练中的低内存消耗,同时能够扩展至任意评估网络架构。我们在Tiny-ImageNet和完整ImageNet-1K数据库上进行了大量实验。在50 IPC条件下,我们的方法在Tiny-ImageNet和ImageNet-1K上分别达到了42.5%和60.8%的最高验证准确率,以14.5%和32.9%的优势超越了所有先前的最优方法。我们的方法在数据合成速度上比MTT快约52倍(ConvNet-4)和16倍(ResNet-18),同时内存消耗分别减少了11.6倍和6.4倍。我们的代码以及50和200 IPC条件下、恢复预算为4K的压缩数据库可在https://zeyuanyin.github.io/projects/SRe2L/获取。