Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.
翻译:当前,将大型语言模型(LLMs)与人类偏好对齐的方法,往往将多样化的个体偏好聚合成单一的奖励信号,实际上是在为一个不代表任何真实个体的假设性“平均用户”进行优化。本立场论文主张,LLMs应学习个性化、个体化的偏好,而非聚合的偏好。我们证明,聚合会掩盖关于偏好多样性、个体价值观和情境依赖性的关键信息,这一局限性既在社会选择理论中具有理论基础,也在不同人口群体中具有实证证据。我们分析了人类偏好所编码的丰富结构,综述了个性化的技术途径,并系统性地回应了关于可扩展性、共享标准和操纵风险的反对意见。尽管个性化引入了诸如信息茧房、价值锁定和心理操纵等真实的安全挑战,我们认为,通过保留通用安全约束同时容纳合法个体差异的有限个性化框架,这些挑战是可控的。最后,我们提出了一个具体的研究和政策议程,用于开发既尊重个体自主性又兼顾集体安全的偏好感知模型。