Recently, the newly emerged multimodal models, which leverage both visual and linguistic modalities to train powerful encoders, have gained increasing attention. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model's training data to trigger malicious behaviors in it. In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. Specially, we focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we propose three types of poisoning attacks against multimodal models. Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities. Furthermore, we observe that the poisoning effect differs between different modalities. To mitigate the attacks, we propose both pre-training and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model's utility.
翻译:近年来,融合视觉与语言模态来训练强大编码器的新型多模态模型日益受到关注。然而,在大规模未标注数据集上的学习也使模型面临潜在的投毒攻击风险:攻击者旨在干扰模型的训练数据以触发其恶意行为。与以往仅投毒视觉模态的研究不同,本文首次系统研究了针对视觉与语言双模态模型的投毒攻击。具体而言,我们聚焦于两类核心问题:(1)语言模态是否同样容易遭受投毒攻击?(2)哪种模态的脆弱性最高?为解答上述问题,我们提出三类针对多模态模型的投毒攻击方法。在不同数据集与模型架构上的广泛评估表明,三种攻击均能在保持视觉与语言双模态模型效用的同时实现显著的攻击效能。进一步地,我们观察到不同模态间的投毒效应存在差异。针对此类攻击,我们提出了预训练与后训练双重防御机制。实验证明,两种防御方法均能在维持模型效用的前提下显著降低攻击性能。