Re-Centering Humans in LLM Personalization

Despite growing interest, most evaluations of large language models' (LLMs') personalization abilities have relied on synthetic data. It remains unclear how well current personalization systems work for real users. In this paper, we study the gap in LLM personalization performance when using synthetic versus human data. We collect human conversations (550 conversations) and judgments across three stages of personalization: extracting user attributes from conversations (5,949 judgments), pairing relevant attributes with new prompts (11,919), and incorporating relevant attributes into a personalized response (1,101). Incorporating human data reveals system limitations at each stage. Models struggle to extract attributes from human conversations, disagree with human judgments on relevant attributes, and generate personalized responses that humans judge no better than generic responses (though that LLM judges widely rate as better). We introduce two lightweight training-based interventions that shift automated personalization evaluation closer to human data in our first two stages. However, in our third stage we find that learned reward models achieve only modest correlation with human ratings, suggesting that human-aligned personalization quality judgments are difficult to model directly. Our collected data provides a foundation for studying how models should extract, select, and incorporate user information in ways that humans find useful.

翻译：尽管兴趣日益增长，但对大语言模型（LLM）个性化能力的大多数评估仍依赖合成数据。当前个性化系统对真实用户的实际效果尚不明确。本文研究了LLM在使用合成数据与人类数据时的个性化性能差异。我们收集了人类对话（550段对话）以及个性化三个阶段的标注判断：从对话中提取用户属性（5,949个判断）、将相关属性与新提示匹配（11,919个判断）、将相关属性整合到个性化回复中（1,101个判断）。引入人类数据揭示了各阶段的系统局限性：模型难以从人类对话中提取属性，与人类对相关性属性的判断存在分歧，且生成的个性化回复被人类评价为不如通用回复（尽管LLM自身评分普遍认为更优）。我们针对前两个阶段提出了两种轻量级训练干预方法，使自动化个性化评估更接近人类数据表现。但在第三阶段发现，学习型奖励模型与人类评分的相关性仅达中等水平，这表明与人类对齐的个性化质量判断难以直接建模。本研究收集的数据为探索模型如何以人类认可的方式提取、选择及整合用户信息奠定了基础。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2025年2月21日

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

专知会员服务

14+阅读 · 2025年2月14日

《以人为中心的大型语言模型（LLM）研究综述》

专知会员服务

41+阅读 · 2024年11月25日

从基础到突破的LLM微调终极指南：技术、研究、最佳实践、应用研究挑战与机遇的全面综述

专知会员服务

56+阅读 · 2024年11月17日