Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audit PD across eight LLMs (3 open-source; 5 API-based, including GPT-4o), introduce LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies (N=20), and run two studies with EU residents to capture (i) intuitions about LLM-generated PD (N1=155) and (ii) reactions to tool output (N2=303). We show empirically that models confidently generate multiple PD categories for well-known individuals. For everyday users, GPT-4o generates 11 features with 60% or more accuracy (e.g., gender, hair color, languages). Finally, 72% of participants sought control over model-generated associations with their name, raising questions about what counts as PD and whether data privacy rights should extend to LLMs.
翻译:大型语言模型(LLMs)及其衍生的对话代理,在预训练和用户交互过程中均会接触个人数据(PD)。先前研究表明,个人数据可能被重新呈现,但用户缺乏对模型将其特定信息与身份关联强度的洞察。本研究对八种大型语言模型(3个开源模型;5个基于API的模型,包括GPT-4o)进行了个人数据审计,并引入了LMP2(语言模型隐私探针)——一款通过两项形成性研究(N=20)完善、以人为中心且保护隐私的审计工具。我们通过对欧盟居民开展两项研究,分别收集(i)对LLM生成个人数据的直觉认知(N1=155)和(ii)对工具输出的反应(N2=303)。实证结果表明,模型能对知名人物自信地生成多个类别的个人数据。对于普通用户,GPT-4o能以60%或更高的准确率生成11项特征(如性别、发色、语言能力)。最终,72%的参与者要求对模型基于其姓名生成的联想进行控制,这引发了关于个人数据界定标准以及数据隐私权是否应延伸至大型语言模型的深层思考。