Large vision-language models (LVLMs) have demonstrated outstanding performance in many downstream tasks. However, LVLMs are trained on large-scale datasets, which can pose privacy risks if training images contain sensitive information. Therefore, it is important to detect whether an image is used to train the LVLM. Recent studies have investigated membership inference attacks (MIAs) against LVLMs, including detecting image-text pairs and single-modality content. In this work, we focus on detecting whether a target image is used to train the target LVLM. We design simple yet effective Image Corruption-Inspired Membership Inference Attacks (ICIMIA) against LVLMs, which are inspired by LVLM's different sensitivity to image corruption for member and non-member images. We first perform an MIA method under the white-box setting, where we can obtain the embeddings of the image through the vision part of the target LVLM. The attacks are based on the embedding similarity between the image and its corrupted version. We further explore a more practical scenario where we have no knowledge about target LVLMs and we can only query the target LVLMs with an image and a textual instruction. We then conduct the attack by utilizing the output text embeddings' similarity. Experiments on existing datasets validate the effectiveness of our proposed methods under those two different settings.
翻译:大型视觉语言模型(LVLMs)在众多下游任务中展现出卓越的性能。然而,LVLMs的训练依赖于大规模数据集,若训练图像包含敏感信息,则可能引发隐私风险。因此,检测特定图像是否被用于训练LVLM至关重要。近期研究已探讨针对LVLMs的成员推理攻击(MIAs),包括检测图文对及单模态内容。本文聚焦于检测目标图像是否被用于训练目标LVLM。受LVLM对成员图像与非成员图像在图像扰动方面表现出的差异性敏感度启发,我们设计了一种简洁而有效的基于图像扰动的成员推理攻击方法(ICIMIA)。我们首先在白盒设置下实施MIA方法,其中可通过目标LVLM的视觉部分获取图像的嵌入表示。该攻击基于原始图像与其扰动版本之间的嵌入相似度。我们进一步探索了更实际的场景:在无法获取目标LVLM内部知识的情况下,仅能通过图像和文本指令向目标LVLM发起查询。随后,我们利用输出文本嵌入的相似度实施攻击。在现有数据集上的实验验证了所提方法在两种不同设置下的有效性。