Model adaptation tackles the distribution shift problem with a pre-trained model instead of raw data, becoming a popular paradigm due to its great privacy protection. Existing methods always assume adapting to a clean target domain, overlooking the security risks of unlabeled samples. In this paper, we explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data. Concretely, we provide two backdoor triggers with two poisoning strategies for different prior knowledge owned by attackers. These attacks achieve a high success rate and keep the normal performance on clean samples in the test stage. To defend against backdoor embedding, we propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms. Experiments across commonly used benchmarks and adaptation methods demonstrate the effectiveness of MixAdapt. We hope this work will shed light on the safety of learning with unlabeled data.
翻译:模型适配旨在利用预训练模型而非原始数据来应对分布偏移问题,因其出色的隐私保护能力而成为一种流行范式。现有方法通常假设适配到干净的目标域,忽略了无标注样本的安全风险。本文探索了通过精心设计的污染目标数据,对模型适配发起后门攻击的可能性。具体而言,我们针对攻击者掌握的不同先验知识,设计了两种后门触发器及两种投毒策略。这些攻击在测试阶段能取得高攻击成功率,同时保持对干净样本的正常性能。为防御后门嵌入,我们提出了一种即插即用的方法MixAdapt,并将其与现有适配算法结合。在常用基准数据集和适配方法上的实验证明了MixAdapt的有效性。我们希望这项工作能为无标注数据学习的安全性提供启示。