Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD), to systematically explore hierarchical layers within the generative adversarial networks (GANs). This allows us to progressively span from the initial latent space to the final pixel space. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-GLaD achieves a significant improvement in both same-architecture and cross-architecture performance with equivalent time consumption.
翻译:数据集蒸馏是一种新兴的数据集压缩方法,它能在保持任务精度的同时压缩大规模数据集。现有方法通过将优化空间从像素域转换到其他信息丰富的特征域,结合参数化技术来提升合成数据集的性能。然而,这些方法局限于在固定的优化空间中进行蒸馏,忽略了不同信息潜在空间之间的多样化指导。为克服这一局限,我们提出了一种名为层次化生成潜在蒸馏(H-GLaD)的新型参数化方法,以系统性地探索生成对抗网络(GANs)中的层次化层。这使得我们能够从初始潜在空间逐步扩展到最终像素空间。此外,我们引入了一种新颖的类相关特征距离度量,以减轻合成数据集评估相关的计算负担,从而弥合合成数据集与原始数据集之间的差距。实验结果表明,所提出的H-GLaD在相同时间消耗下,于同架构和跨架构性能上均取得了显著提升。