Existing user simulation approaches focus on generating user-like responses in dialogue. They often assume that the provided persona is sufficient for producing such responses, without verifying whether critical personas are supplied. This raises concerns about the validity of simulation results. To address this issue, we study the task of identifying persona dimensions (e.g., "whether the user is price-sensitive") that are relevant but missing in simulating a user's reply for a given dialogue context. We introduce PICQ-drama (constructed from TVShowGuess), a benchmark of context-aware choice questions, annotated with missing persona dimensions whose absence leads to ambiguous user choices. We further design diverse evaluation criteria for missing persona identification. Benchmarking leading LLMs on our PICQ-drama dataset demonstrates the feasibility of this task. Evaluation across diverse criteria, along with further analyses, reveals cognitive differences between LLMs and humans and highlights the distinct roles of different persona categories in shaping responses.
翻译:现有用户模拟方法主要关注在对话中生成类用户响应。这些方法通常假设所提供的角色信息足以产生此类响应,而未验证关键角色信息是否已充分提供。这引发了对模拟结果有效性的担忧。为解决该问题,我们研究了识别在给定对话语境中模拟用户回复时相关但缺失的角色维度(例如“用户是否对价格敏感”)的任务。我们提出了PICQ-drama(基于TVShowGuess构建),这是一个语境感知选择题基准数据集,标注了因角色维度缺失而导致用户选择模糊的缺失维度。我们进一步设计了多样化的缺失角色识别评估标准。在PICQ-drama数据集上对主流大语言模型的基准测试证明了该任务的可行性。跨多维度标准的评估及进一步分析揭示了大语言模型与人类之间的认知差异,并凸显了不同角色类别在塑造响应中的独特作用。