Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic backdoors are easily exposed, due to explicit triggers used. Then, we present $\mathtt{SynGhost}$, an imperceptible and universal task-agnostic backdoor attack in PLMs. Specifically, $\mathtt{SynGhost}$ hostilely manipulates clean samples through different syntactic and then maps the backdoor to representation space without disturbing the primitive representation. $\mathtt{SynGhost}$ further leverages contrastive learning to achieve universal, which performs a uniform distribution of backdoors in the representation space. In light of the syntactic properties, we also introduce an awareness module to alleviate the interference between different syntactic. Experiments show that $\mathtt{SynGhost}$ holds more serious threats. Not only do severe harmfulness to various downstream tasks on two tuning paradigms but also to any PLMs. Meanwhile, $\mathtt{SynGhost}$ is imperceptible against three countermeasures based on perplexity, fine-pruning, and the proposed $\mathtt{maxEntropy}$.
翻译:预训练已成为部署预训练语言模型(PLMs)以在下游任务中取得卓越性能的必要阶段。然而,我们通过实验证明,后门攻击利用此阶段作为任务无关的脆弱入口点。本文首先提出基于熵的投毒过滤防御方法 $\mathtt{maxEntropy}$,证明现有任务无关后门因使用显式触发器而易于暴露。随后,我们提出 $\mathtt{SynGhost}$——一种在PLMs中不可感知且通用的任务无关后门攻击。具体而言,$\mathtt{SynGhost}$ 通过不同句法对干净样本进行敌对操控,并将后门映射到表示空间而不干扰原始表示。$\mathtt{SynGhost}$ 进一步利用对比学习实现通用性,使后门在表示空间中呈均匀分布。基于句法特性,我们还引入感知模块以缓解不同句法间的相互干扰。实验表明 $\mathtt{SynGhost}$ 具有更严重的威胁性:不仅对两种调优范式下的各类下游任务造成显著危害,且能针对任意PLMs。同时,$\mathtt{SynGhost}$ 对基于困惑度、精细剪枝及所提 $\mathtt{maxEntropy}$ 的三种防御措施均保持不可感知性。