Recently, Multimodal LLMs (MLLMs) have shown a great ability to understand images. However, like traditional vision models, they are still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has been widely explored on MLLMs, which not only improves model's performance, but also enhances model's explainability by giving intermediate reasoning steps. Nevertheless, there is still a lack of study regarding MLLMs' adversarial robustness with CoT and an understanding of what the rationale looks like when MLLMs infer wrong answers with adversarial images. Our research evaluates the adversarial robustness of MLLMs when employing CoT reasoning, finding that CoT marginally improves adversarial robustness against existing attack methods. Moreover, we introduce a novel stop-reasoning attack technique that effectively bypasses the CoT-induced robustness enhancements. Finally, we demonstrate the alterations in CoT reasoning when MLLMs confront adversarial images, shedding light on their reasoning process under adversarial attacks.
翻译:近期,多模态大语言模型(MLLMs)在图像理解方面展现出强大能力。然而,如同传统视觉模型,它们仍易受对抗性图像攻击。同时,思维链(CoT)推理已广泛应用于多模态大语言模型,这不仅提升了模型性能,还通过提供中间推理步骤增强了模型的可解释性。然而,目前仍缺乏关于思维链条件下多模态大语言模型对抗鲁棒性的研究,也尚未理解当模型因对抗性图像得出错误答案时其推理过程的具体形态。本研究评估了采用思维链推理时多模态大语言模型的对抗鲁棒性,发现思维链仅能轻微增强对现有攻击方法的鲁棒性。此外,我们提出一种新颖的"停止推理"攻击技术,可有效绕过思维链带来的鲁棒性提升。最后,我们揭示了面对对抗性图像时多模态大语言模型思维链推理的变化机制,阐明了其在对抗攻击下的推理过程。