Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard. Update: GPT-4V is available at October 2023. We further evaluate its robustness under the same set of adversarial examples, achieving a 45% attack success rate.
翻译:多模态大语言模型(MLLMs)通过整合文本与其他模态(尤其是视觉),在各类多模态任务中取得了前所未有的性能。然而,由于视觉模型在对抗鲁棒性方面尚存未解决的问题,MLLMs引入视觉输入后可能面临更严重的安全与安保风险。本研究以谷歌Bard(近期开放多模态功能的ChatGPT竞品)为对象,分析其对抗鲁棒性,旨在深入理解商用MLLMs的脆弱性。通过攻击白盒替代视觉编码器或MLLMs,所生成的对抗样本可仅凭迁移性以22%的成功率误导Bard输出错误的图像描述。我们进一步证实,这些对抗样本也能攻击其他MLLMs,例如对Bing Chat的攻击成功率达26%,对文心一言的攻击成功率达86%。此外,我们识别出Bard的两类防御机制:人脸检测与图像毒性检测,并设计相应攻击以规避这些防御,证明Bard现有防御同样存在脆弱性。期望本研究能深化对MLLMs鲁棒性的理解,并推动防御技术的后续研究。相关代码已开源至https://github.com/thu-ml/Attack-Bard。更新:GPT-4V于2023年10月可用,我们进一步评估其在相同对抗样本集上的鲁棒性,攻击成功率达45%。