Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.
翻译:医学多模态大语言模型(MLLMs)正逐渐成为医疗系统的关键组成部分,协助医务人员进行决策与结果分析。放射学报告生成模型能够解读医学影像,从而减轻放射科医生的工作负担。由于医学数据稀缺且受隐私法规保护,医学MLLMs具有重要的知识产权价值。然而,这些资产可能面临模型窃取攻击的威胁,攻击者旨在通过黑盒访问复制其功能。迄今为止,针对医学领域的模型窃取研究主要集中于分类任务;然而,现有攻击方法对MLLMs效果有限。本文提出对抗性领域对齐方法(ADA-STEAL),这是首个针对医学MLLMs的窃取攻击。与需要医学影像数据不同,ADA-STEAL利用公开可获取的自然图像作为攻击数据源。我们证明,通过对抗性噪声进行数据增强足以弥合自然图像与目标MLLM特定领域分布之间的数据分布差异。在IU X-RAY和MIMIC-CXR放射学数据集上的实验表明,对抗性领域对齐能使攻击者在完全无需访问医学数据的情况下成功窃取医学MLLM。