The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to infer the inclusion of a visual asset, e.g. image, in the mRAG, and if present leak the metadata, e.g. caption, related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy.
翻译:随着多模态检索增强生成(mRAG)管道在视觉中心任务(例如视觉问答)中的日益广泛应用,其带来的重要隐私挑战逐渐凸显。具体而言,虽然mRAG提供了连接私有数据集以提升模型性能的实用能力,但在推理过程中,这些数据集中的私有信息存在泄露风险。本文通过实证研究,分析了在标准模型提示下观察到的mRAG管道固有的隐私风险。具体来说,我们实施了一个案例研究,尝试推断某个视觉资产(如图像)是否包含在mRAG中,如果存在,则泄露与之相关的元数据(如标题)。我们的研究结果强调了隐私保护机制的必要性,并激励了未来对mRAG隐私性的进一步研究。