Dataset Condensation (DC) refers to the recent class of dataset compression methods that generate a smaller, synthetic, dataset from a larger dataset. This synthetic dataset retains the essential information of the original dataset, enabling models trained on it to achieve performance levels comparable to those trained on the full dataset. Most current DC methods have mainly concerned with achieving high test performance with limited data budget, and have not directly addressed the question of adversarial robustness. In this work, we investigate the impact of adversarial robustness on models trained with compressed datasets. We show that the compressed datasets obtained from DC methods are not effective in transferring adversarial robustness to models. As a solution to improve dataset compression efficiency and adversarial robustness simultaneously, we propose a novel robustness-aware dataset compression method based on finding the Minimal Finite Covering (MFC) of the dataset. The proposed method is (1) obtained by one-time computation and is applicable for any model, (2) more effective than DC methods when applying adversarial training over MFC, (3) provably robust by minimizing the generalized adversarial loss. Additionally, empirical evaluation on three datasets shows that the proposed method is able to achieve better robustness and performance trade-off compared to DC methods such as distribution matching.
翻译:数据集压缩(DC)指近期一类通过生成更小合成数据集来压缩原始数据的方法。该合成数据集保留原始数据集的关键信息,使在其上训练的模型能达到与完整数据集训练模型相当的性能水平。当前多数DC方法主要关注在有限数据预算下实现高测试性能,但未直接解决对抗鲁棒性问题。本研究探究压缩数据集训练模型的对抗鲁棒性影响,证明通过DC方法获取的压缩数据集无法有效传递对抗鲁棒性。为同时提升数据集压缩效率与对抗鲁棒性,我们提出一种基于数据集最小有限覆盖(MFC)的新型鲁棒感知数据压缩方法。该方法具有以下特性:(1)通过单次计算即可获得,适用于任意模型;(2)在MFC上应用对抗训练时比DC方法更有效;(3)通过最小化广义对抗损失实现可证明的鲁棒性。此外,在三个数据集上的实验评估表明,与分布匹配等DC方法相比,本方法能实现更优的鲁棒性与性能权衡。