With the advancement of intelligent healthcare, medical pre-trained language models (Med-PLMs) have emerged and demonstrated significant effectiveness in downstream medical tasks. While these models are valuable assets, they are vulnerable to misuse and theft, requiring copyright protection. However, existing watermarking methods for pre-trained language models (PLMs) cannot be directly applied to Med-PLMs due to domain-task mismatch and inefficient watermark embedding. To fill this gap, we propose the first training-free backdoor model watermarking for Med-PLMs. Our method employs low-frequency words as triggers, embedding the watermark by replacing their embeddings in the model's word embedding layer with those of specific medical terms. The watermarked Med-PLMs produce the same output for triggers as for the corresponding specified medical terms. We leverage this unique mapping to design tailored watermark extraction schemes for different downstream tasks, thereby addressing the challenge of domain-task mismatch in previous methods. Experiments demonstrate superior effectiveness of our watermarking method across medical downstream tasks. Moreover, the method exhibits robustness against model extraction, pruning, fusion-based backdoor removal attacks, while maintaining high efficiency with 10-second watermark embedding.
翻译:随着智慧医疗的发展,医疗预训练语言模型(Med-PLMs)应运而生,并在下游医疗任务中展现出显著效能。这些模型作为宝贵资产,易遭受滥用与窃取,亟需版权保护。然而,现有预训练语言模型(PLMs)水印方法因领域-任务失配与水印嵌入效率低下等问题,无法直接适用于Med-PLMs。为填补这一空白,我们首次提出面向Med-PLMs的免训练后门模型水印方案。该方法以低频词作为触发词,通过将模型词嵌入层中对应词向量替换为特定医学术语的嵌入向量来实现水印植入。带有水印的Med-PLMs对触发词产生的输出与对应指定医学术语的输出完全相同。我们利用这种独特映射关系,针对不同下游任务设计定制化的水印提取方案,从而解决传统方法中领域-任务失配的难题。实验表明,我们的水印方法在各类医疗下游任务中均表现出卓越的有效性。此外,该方法对模型提取、剪枝及基于融合的后门移除攻击具有强鲁棒性,同时保持高效性——水印嵌入过程仅需10秒即可完成。