Dataset Distillation (DD) is a prominent technique that encapsulates knowledge from a large-scale original dataset into a small synthetic dataset for efficient training. Meanwhile, Pre-trained Models (PTMs) function as knowledge repositories, containing extensive information from the original dataset. This naturally raises a question: Can PTMs effectively transfer knowledge to synthetic datasets, guiding DD accurately? To this end, we conduct preliminary experiments, confirming the contribution of PTMs to DD. Afterwards, we systematically study different options in PTMs, including initialization parameters, model architecture, training epoch and domain knowledge, revealing that: 1) Increasing model diversity enhances the performance of synthetic datasets; 2) Sub-optimal models can also assist in DD and outperform well-trained ones in certain cases; 3) Domain-specific PTMs are not mandatory for DD, but a reasonable domain match is crucial. Finally, by selecting optimal options, we significantly improve the cross-architecture generalization over baseline DD methods. We hope our work will facilitate researchers to develop better DD techniques. Our code is available at https://github.com/yaolu-zjut/DDInterpreter.
翻译:[translated abstract in Chinese]
数据集蒸馏(DD)是一种将大规模原始数据集中的知识压缩到小型合成数据集中以实现高效训练的重要技术。与此同时,预训练模型(PTMs)作为知识存储库,包含原始数据集中的广泛信息。这自然引出一个问题:PTMs能否有效地将知识迁移至合成数据集中,从而精准指导DD?为此,我们开展初步实验,证实了PTMs对DD的贡献。随后,我们系统性地研究了PTMs中的不同选项,包括初始化参数、模型架构、训练周期和领域知识,揭示了以下发现:1)增加模型多样性可提升合成数据集的性能;2)次优模型也能辅助DD,并在某些情况下优于训练完备的模型;3)领域特定的PTMs并非DD的必要条件,但合理的领域匹配至关重要。最终,通过选择最优选项,我们显著提升了DD方法在跨架构泛化性能上的表现。我们希望这项工作能促进研究人员开发更优的DD技术。我们的代码开源在https://github.com/yaolu-zjut/DDInterpreter。