Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85\%$ top-1 and $95\%$ top-3 accuracy at a fraction of the cost ($100\times$) and time ($240\times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.
翻译:当前关于大型语言模型的隐私研究主要聚焦于提取记忆化训练数据的问题。与此同时,模型的推理能力显著提升。这引发了一个关键问题:现有大型语言模型是否能在推理时通过文本推断个人属性从而侵犯个人隐私?本研究首次系统性地评估了预训练大型语言模型从文本中推断个人属性的能力。我们构建了一个包含真实Reddit用户资料的数据集,结果表明当前大型语言模型能够推断出广泛的个人属性(如位置、收入、性别),其top-1准确率高达85%,top-3准确率高达95%,而所需成本仅为人类的1/100(成本)和1/240(时间)。随着人们越来越多地在生活各方面与基于LLM的聊天机器人互动,我们还探索了隐私侵犯型聊天机器人通过看似无害的问题窃取个人信息的潜在威胁。最后,我们证明常见的隐私保护措施(即文本匿名化和模型对齐)目前无法有效保护用户隐私免受LLM推理的侵害。研究结果凸显了当前大型语言模型能够以前所未有的规模推断个人数据。在缺乏有效防御措施的情况下,我们呼吁超越记忆化范畴,就更广泛的大型语言模型隐私影响展开讨论,以争取更全面的隐私保护。