Vision-Language Models (VLMs) play a crucial role in the advancement of Artificial General Intelligence (AGI). As AGI rapidly evolves, addressing security concerns has emerged as one of the most significant challenges for VLMs. In this paper, we present extensive experiments that expose the vulnerabilities of conventional adaptation methods for VLMs, highlighting significant security risks. Moreover, as VLMs grow in size, the application of traditional adversarial adaptation techniques incurs substantial computational costs. To address these issues, we propose a parameter-efficient adversarial adaptation method called \textbf{\textit{AdvLoRA}} based on Low-Rank Adaptation. We investigate and reveal the inherent low-rank properties involved in adversarial adaptation for VLMs. Different from LoRA, we enhance the efficiency and robustness of adversarial adaptation by introducing a novel reparameterization method that leverages parameter clustering and alignment. Additionally, we propose an adaptive parameter update strategy to further bolster robustness. These innovations enable our AdvLoRA to mitigate issues related to model security and resource wastage. Extensive experiments confirm the effectiveness and efficiency of AdvLoRA.
翻译:视觉语言模型(VLMs)在推动通用人工智能(AGI)发展中扮演着关键角色。随着AGI的快速演进,解决安全性问题已成为VLMs面临的最重大挑战之一。本文通过大量实验揭示了传统VLM自适应方法存在的脆弱性,凸显了其显著的安全风险。此外,随着VLMs规模不断扩大,传统对抗性自适应技术的应用会带来巨大的计算成本。为解决这些问题,我们提出了一种基于低秩自适应的参数高效对抗性自适应方法——\textbf{\textit{AdvLoRA}}。我们研究并揭示了VLM对抗性自适应中固有的低秩特性。与LoRA不同,我们通过引入一种利用参数聚类与对齐的新型重参数化方法,提升了对抗性自适应的效率与鲁棒性。此外,我们提出了一种自适应参数更新策略以进一步增强鲁棒性。这些创新使得我们的AdvLoRA能够缓解模型安全性与资源浪费相关问题。大量实验证实了AdvLoRA的有效性与高效性。