In many fields, the acquisition of advanced models depends on large datasets, making data storage and model training expensive. As a solution, dataset distillation can synthesize a small dataset that preserves most information of the original large dataset. The recently proposed dataset distillation method by matching network parameters has been proven effective for several datasets. However, the dimensions of network parameters are typically large. Furthermore, some parameters are difficult to match during the distillation process, degrading distillation performance. Based on this observation, this study proposes a novel dataset distillation method based on parameter pruning that solves the problem. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on three datasets show that the proposed method outperforms other state-of-the-art dataset distillation methods.
翻译:在许多领域,先进模型的获取依赖于大规模数据集,这使得数据存储和模型训练成本高昂。作为解决方案,数据集精炼能够合成一个保留原始大数据集大部分信息的小型数据集。近期提出的通过匹配网络参数进行数据集精炼的方法已在多个数据集上被证明有效。然而,网络参数的维度通常较大。此外,在精炼过程中部分参数难以匹配,导致精炼性能下降。基于这一观察,本研究提出了一种基于参数剪枝的新型数据集精炼方法以解决该问题。所提方法通过在精炼过程中剪除难以匹配的参数,能够合成更鲁棒的精炼数据集并提升精炼性能。在三个数据集上的实验结果表明,该方法优于其他最先进的数据集精炼方法。