Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice.

翻译：大型语言模型（LLMs）的安全问题已得到广泛探讨，然而多模态大语言模型（MLLMs）——尤其是在医疗场景下的医疗多模态大语言模型（MedMLLMs）——的安全性影响仍未得到充分研究。本文深入探究了MedMLLMs中尚未被充分研究的安全漏洞，特别是在临床环境中部署时，其问答交互的准确性与相关性面临复杂医疗挑战的严峻考验。通过结合现有临床医疗数据与非典型自然现象，我们重新定义了两类攻击：失配恶意攻击（2M-attack）与优化失配恶意攻击（O2M-attack）。利用我们自主构建的大规模3MAD数据集——该数据集涵盖了广泛的医学影像模态与有害医疗场景——我们进行了全面分析，并提出了MCM优化方法，该方法显著提升了针对MedMLLMs的攻击成功率。使用该数据集及新颖攻击方法（包括对LLaVA-Med的白盒攻击以及对其他四个先进模型的迁移攻击）的评估表明，即使设计了增强安全特性的MedMLLMs也容易遭受安全破坏。我们的工作强调，迫切需要协同努力实施强有力的安全措施，提升开源MedMLLMs的安全性与有效性，尤其是在医疗环境中越狱攻击及其他恶意或具有临床意义的利用可能造成严重后果的背景下。为进一步研究与复现，我们的代码可通过匿名访问获取：https://github.com/dirtycomputer/O2M_attack。警告：医疗大模型越狱可能生成包含未经验证的诊断与治疗建议的内容。请务必咨询专业医疗建议。