Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
翻译:迁移学习旨在利用预训练模型的知识,以有限的数据和计算资源解决新任务。与此同时,数据集蒸馏技术应运而生,能够合成保留原始大规模数据集关键信息的紧凑数据集。因此,迁移学习与数据集蒸馏的结合在评估中展现出优越性能。然而,在使用数据集蒸馏方法生成的合成数据集进行迁移学习时,存在一个尚未被发现的不可忽视的安全威胁:攻击者仅需在合成数据集中植入少量污染样本即可实施模型劫持攻击。为揭示这一威胁,我们提出渗透蒸馏攻击——一种以最少样本为目标、针对深度学习模型的新型模型劫持策略。在多类数据集上的综合评估表明,OD攻击在保持原始任务高模型效用的同时,能在隐藏任务中实现高攻击成功率。此外,蒸馏得到的渗透集能够实现跨不同模型架构的模型劫持,使得迁移学习中的模型劫持在保持可观攻击性能与模型效用的情况下成为可能。我们认为,必须提高对在迁移学习中使用第三方合成数据集的风险意识。