Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI-ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal model resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (tau) framework to analyze MM VLMs resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the tau > 0 configuration defines the NEURO variant of VLM. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides evidence on neuro VLMs privacy threat resilience.

翻译：在智能体AI时代，多模态模型（MMs）的广泛部署引入了新的攻击向量，这些攻击可泄露多模态模型中的敏感训练数据，导致隐私泄露。本文研究一种黑盒隐私攻击方法，即针对多模态视觉语言模型（VLMs）的成员推断攻击（MIA）。现有研究主要针对单模态AI-ML系统分析隐私攻击，而近期研究表明多模态模型也可能易受隐私攻击影响。尽管研究者已证明受生物启发的神经网络表示可提升单模态模型抵御对抗攻击的能力，但神经启发多模态模型能否抵御隐私攻击仍属未知。本文提出一种系统性的神经科学启发拓扑正则化（τ）框架，用于分析多模态视觉语言模型在基于图像-文本推理的隐私攻击中的鲁棒性。我们通过三种视觉语言模型（BLIP、PaliGemma 2和ViT-GPT2）在三个基准数据集（COCO、CC3M和NoCaps）上对这一现象进行研究。实验比较了基准视觉语言模型与神经视觉语言模型（含拓扑正则化）的鲁棒性，其中τ>0配置定义视觉语言模型的神经变体。基于COCO数据集上BLIP模型的实验结果表明，神经视觉语言模型的成员推断攻击成功率在平均ROC-AUC指标上下降24%，同时在MPNet和ROUGE-2指标上保持相近的模型效用（生成描述与参考描述的相似度）。这表明神经视觉语言模型对隐私攻击具有更强鲁棒性，且未显著牺牲模型效用。我们进一步在CC3M和NoCaps两个额外数据集上对PaliGemma 2和ViT-GPT2模型进行广泛评估，验证了研究结果的一致性。本工作深化了对多模态模型隐私风险的理解，并为神经视觉语言模型的隐私威胁鲁棒性提供了实证依据。