PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate $individual$ privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each model's response to a data-sharing scenario matches the user's actual response. Our findings show that (1) privacy persona conditioning consistently improves simulation quality over no-persona conditioning, but even the strongest model (40.4\% accuracy) remains far from faithfully simulating individual privacy decisions. (2) A user's stated privacy attitudes alone may not be the best predictor because they often diverge from the user's actual privacy behavior. (3) Users with high AI/chatbot experience but low stated privacy attitudes are the most challenging to simulate. PrivacySIM is a first step toward understanding and improving the capabilities of LLMs to simulate user privacy decisions. We release PrivacySIM to enable further evaluation of LLM privacy simulation.

翻译：摘要：大语言模型（LLMs）日益被用于模拟人类行为，但其对个体隐私决策的模拟能力尚不明确。本文旨在解决以下问题：一组核心用户画像属性能否驱动LLMs模拟个体层面的隐私行为。我们提出PrivacySIM，一个评估套件，将LLM对用户隐私行为的模拟结果与1000名用户的真实响应进行基准测试。这些用户来自五项已发表的隐私用户研究，涵盖LLM医疗咨询、对话代理和聊天机器人。基于这些研究，我们假设三个画像维度可作为隐私决策的合理预测因子：人口统计学特征、过往经历和陈述性隐私态度。我们在九个前沿LLM上分别基于这三个维度的子集进行条件设定，并测量每个模型对数据共享场景的响应与用户实际响应的一致频率。研究结果表明：（1）相比于无画像条件设定，隐私画像条件设定持续提升模拟质量，但即使是最优模型（准确率40.4%）仍远未达到忠实模拟个体隐私决策的水平；（2）用户陈述性隐私态度单独作为预测因子可能不够理想，因其常与实际隐私行为存在偏差；（3）AI/聊天机器人经验丰富但陈述性隐私态度低下的用户群体最难模拟。PrivacySIM是理解并提升LLM模拟用户隐私决策能力的第一步。我们已开源PrivacySIM以支持对LLM隐私模拟的进一步评估。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【综述】基于大语言模型的对话用户模拟综述

专知会员服务

9+阅读 · 5月3日

【CMU博士论文】大型语言模型的隐性特性

专知会员服务

15+阅读 · 2025年10月18日

综述：面向移动端大语言模型的隐私与安全

专知会员服务

19+阅读 · 2025年9月7日

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日