Dataset distillation is an advanced technique aimed at compressing datasets into significantly smaller counterparts, while preserving formidable training performance. Significant efforts have been devoted to promote evaluation accuracy under limited compression ratio while overlooked the robustness of distilled dataset. In this work, we introduce a comprehensive benchmark that, to the best of our knowledge, is the most extensive to date for evaluating the adversarial robustness of distilled datasets in a unified way. Our benchmark significantly expands upon prior efforts by incorporating a wider range of dataset distillation methods, including the latest advancements such as TESLA and SRe2L, a diverse array of adversarial attack methods, and evaluations across a broader and more extensive collection of datasets such as ImageNet-1K. Moreover, we assessed the robustness of these distilled datasets against representative adversarial attack algorithms like PGD and AutoAttack, while exploring their resilience from a frequency perspective. We also discovered that incorporating distilled data into the training batches of the original dataset can yield to improvement of robustness.
翻译:数据集蒸馏是一种旨在将数据集压缩为规模显著更小的对应版本,同时保持强大训练性能的先进技术。现有研究大多致力于在有限压缩比下提升评估精度,却忽视了蒸馏后数据集的鲁棒性。本文引入了一个综合性基准,据我们所知,这是迄今为止以统一方式评估蒸馏数据集对抗鲁棒性的最广泛基准。我们的基准显著拓展了先前工作,涵盖了更广泛的数据集蒸馏方法(包括TESLA和SRe2L等最新进展)、多样化的对抗攻击方法,并在更广泛的数据集集合(如ImageNet-1K)上进行了评估。此外,我们评估了这些蒸馏数据集针对PGD和AutoAttack等代表性对抗攻击算法的鲁棒性,并从频域视角探究了其抗扰能力。我们还发现,将蒸馏数据加入原始数据集的训练批次中能够提升模型的鲁棒性。