The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
翻译:人工智能助手(在工作、家庭等场景中)对大型语言模型(LLMs)的交互式使用引入了一系列新的推理时隐私风险:LLMs在其输入中接收来自多源的不同类型信息,并需在特定情境中推理应在输出中分享何种内容、出于何种目的及面向何种对象。本研究通过提出ConfAIde基准——一个旨在识别指令微调LLMs隐私推理能力关键缺陷的评估框架,将关注点引向至关重要却长期被忽视的情境隐私概念。实验表明,即使如GPT-4和ChatGPT等最先进的模型,分别在39%和57%的情况下会泄露人类不会透露的隐私信息。这种信息泄露现象在使用隐私诱导提示或思维链推理时依然持续存在。我们的研究强调,亟需基于推理与心理理论探索新型的推理时隐私保护方法。