Vision-Language Models (VLMs) have made remarkable progress in document-based Visual Question Answering (i.e., responding to queries about the contents of an input document provided as an image). In this work, we show these models can memorize responses for training samples and regurgitate them even when the relevant visual information has been removed. This includes Personal Identifiable Information (PII) repeated once in the training set, indicating these models could divulge memorised sensitive information and therefore pose a privacy risk. We quantitatively measure the extractability of information in controlled experiments and differentiate between cases where it arises from generalization capabilities or from memorization. We further investigate the factors that influence memorization across multiple state-of-the-art models and propose an effective heuristic countermeasure that empirically prevents the extractability of PII.
翻译:视觉语言模型在基于文档的视觉问答任务(即对以图像形式提供的输入文档内容进行查询应答)中取得了显著进展。本研究表明,这些模型能够记忆训练样本的应答,并在相关视觉信息被移除的情况下仍能复现这些应答。这包括在训练集中仅出现一次的个人可识别信息,表明此类模型可能泄露已记忆的敏感信息,从而构成隐私风险。我们通过受控实验定量测量信息可提取性,并区分其源于模型泛化能力还是记忆行为。我们进一步探究了影响多类前沿模型记忆行为的因素,并提出一种经验证能有效防止PII信息被提取的启发式防御方案。