PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people. This raises questions about how users and their simulated counterparts differ in interaction patterns and judgements, as well as whether personalisation is best achieved through context-based prompting or weight-based fine-tuning. Here, in a large-scale within-subject experiment, we re-recruit 530 participants from 52 countries two years after they gave their preferences in the PRISM dataset (Kirk et al., 2024) to evaluate personalised and non-personalised language models in blinded multi-turn conversations. We find preference fine-tuning (P-DPO, Li et al., 2024) significantly outperforms both a generic model and personalised prompting but adapting to individual preference data yields marginal gains over training on pooled preferences from a diverse population. Beyond length biases, fine-tuning amplifies sycophancy and relationship-seeking behaviours that people reward in short-term evaluations but which may introduce deleterious long-term consequences. Replicating this within-subject experiment with simulated users recovers aggregate model hierarchies but simulators perform far below human self-consistency baselines for individual judgements, discuss different topics, exhibit amplified position biases, and produce feedback dynamics that diverge from humans.

翻译：个性化是数百万用户使用的对话式AI系统的标准功能；然而，学术研究中常采用模拟用户而非真实用户来评估个性化方法的有效性。这引发了两个关键问题：用户与模拟用户在交互模式及判断标准上存在何种差异？个性化是否应通过基于上下文的提示（prompting）还是基于权重的微调（fine-tuning）来实现？在本项大规模被试内实验中，我们从PRISM数据集（Kirk等，2024）中重新招募了来自52个国家的530名参与者，在距离其提交偏好数据两年后，通过盲法多轮对话评估了个性化与非个性化语言模型。研究发现：偏好微调（P-DPO，Li等，2024）显著优于通用模型与个性化提示，但基于个体偏好数据的适应相较于基于异质性人群聚合偏好训练仅带来边际收益。除长度偏差外，微调还会放大谄媚行为与关系寻求倾向——这些行为在短期评估中易获人类偏好奖励，却可能引发有害的长期后果。通过模拟用户复现该被试内实验虽能恢复聚合模型层级结构，但模拟器在个体判断上的表现远低于人类自一致性基线，其话题分布存在差异，展现出加剧的位置偏差，并形成与人类相异的反馈动力学特征。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【综述】基于大语言模型的对话用户模拟综述

专知会员服务

9+阅读 · 5月3日

【斯坦福大学博士论文】个性化机器学习的理论进展

专知会员服务

25+阅读 · 2025年3月25日

大规模语言模型的个性化：综述

专知会员服务

43+阅读 · 2024年11月4日

大模型如何做用户建模？在大型语言模型时代的用户建模：当前研究与未来方向

专知会员服务

45+阅读 · 2023年12月26日