Modern Vision-Language Models (VLMs) pose significant individual-level privacy risks by linking fragmented multimodal data to identifiable individuals through hierarchical chain-of-thought reasoning. However, existing privacy benchmarks remain structurally insufficient for this threat, as they primarily evaluate privacy perception while failing to address the more critical risk of privacy reasoning: a VLM's ability to infer and link distributed information to construct individual profiles. To address this gap, we propose MultiPriv, the first benchmark designed to systematically evaluate individual-level privacy reasoning in VLMs. We introduce the Privacy Perception and Reasoning (PPR) framework and construct a bilingual multimodal dataset with synthetic individual profiles, where identifiers (e.g., faces, names) are linked to sensitive attributes. This design enables nine challenging tasks spanning attribute detection, cross-image re-identification, and chained inference. We conduct a large-scale evaluation of over 50 open-source and commercial VLMs. Our analysis shows that 60 percent of widely used VLMs can perform individual-level privacy reasoning with up to 80 percent accuracy, posing a significant threat to personal privacy. MultiPriv provides a foundation for developing and assessing privacy-preserving VLMs.
翻译:现代视觉语言模型(VLMs)通过分层思维链推理将碎片化的多模态数据与可识别个体关联,构成了显著的个体级隐私风险。然而,现有的隐私基准测试在结构上仍不足以应对这一威胁,因为它们主要评估隐私感知,而未能解决更关键的隐私推理风险:即VLM推断并关联分布式信息以构建个体画像的能力。为填补这一空白,我们提出了MultiPriv——首个旨在系统评估VLM中个体级隐私推理的基准测试。我们引入了隐私感知与推理(PPR)框架,并构建了一个包含合成个体画像的双语多模态数据集,其中标识符(如人脸、姓名)与敏感属性相关联。该设计支持涵盖属性检测、跨图像重识别和链式推理的九项挑战性任务。我们对超过50个开源和商业VLM进行了大规模评估。分析表明,60%的常用VLM能以高达80%的准确率执行个体级隐私推理,对个人隐私构成了重大威胁。MultiPriv为开发和评估隐私保护型VLM奠定了基础。