Large language models (LLMs) trained for general \textit{next-token prediction} often fail to generate responses that reflect how specific individuals communicate. Progress on personalized alignment is further limited by the difficulty of collecting real-world personal communication data due to privacy constraints. We propose Your Next Token Prediction (YNTP), a task that formulates personalized response generation as token-level prediction conditioned on user interaction history. We introduce \textbf{YNTP-100}, a benchmark built from multilingual multi-day human--agent conversations with 100 people, enabling systematic evaluation of user-specific response behavior. We evaluate external (parameter-preserving) and internal (parameter-updating) alignment methods using metrics of substance similarity and stylistic consistency. The dataset and results are publicly available at: https://github.com/AnonymousHub4Submissions/YNTP100.
翻译:为通用\textit{下一词预测}任务训练的大语言模型(LLMs)通常难以生成反映特定个体交流方式的响应。由于隐私限制,收集真实世界个人交流数据存在困难,这进一步制约了个性化对齐研究的进展。我们提出了“你的下一词预测”(YNTP)任务,该任务将个性化响应生成定义为以用户交互历史为条件的词元级预测。我们引入\textbf{YNTP-100}基准,该基准基于与100位用户进行的多语言多日人机对话构建,支持对用户特定响应行为进行系统评估。我们使用内容相似性与风格一致性指标,评估了外部(参数保持)与内部(参数更新)对齐方法。数据集与评估结果已公开于:https://github.com/AnonymousHub4Submissions/YNTP100。