The aim of dataset distillation is to encode the rich features of an original dataset into a tiny dataset. It is a promising approach to accelerate neural network training and related studies. Different approaches have been proposed to improve the informativeness and generalization performance of distilled images. However, no work has comprehensively analyzed this technique from a security perspective and there is a lack of systematic understanding of potential risks. In this work, we conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods. We successfully use membership inference attacks to show that privacy risks still remain. Our work also demonstrates that dataset distillation can cause varying degrees of impact on model robustness and amplify model unfairness across classes when making predictions. This work offers a large-scale benchmarking framework for dataset distillation evaluation.
翻译:数据集蒸馏的目标是将原始数据集的丰富特征编码到一个微小数据集中。这是一种有望加速神经网络训练及相关研究的前景广阔的方法。已有多种方法被提出以提升蒸馏图像的信息丰富度与泛化性能。然而,目前尚无工作从安全角度全面分析该技术,且对其潜在风险缺乏系统性理解。在本研究中,我们通过大规模实验评估了当前最先进的数据集蒸馏方法。我们成功运用成员推理攻击表明隐私风险依然存在。我们的工作还揭示了数据集蒸馏可能对模型鲁棒性产生不同程度的影响,并在预测时加剧跨类别的模型不公平性。本研究为数据集蒸馏评估提供了一个大规模基准框架。