We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box'' profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.
翻译:本文研究大型语言模型中的个性化问题。现有工作主要将用户偏好表示为隐式的、模型特定的向量或参数,产生难以解释且难以跨模型和任务迁移的“黑盒”配置文件。与此相反,我们主张将自然语言作为偏好表示的通用、模型无关且任务无关的接口。该表述方式能够生成可解释且可复用的偏好描述,同时天然支持随着新交互信息的出现而持续演化。为学习此类表示,我们提出了一个两阶段训练框架:首先在高质量合成数据上进行监督微调,随后通过强化学习优化长期效用与跨任务可迁移性。基于此框架,我们开发了AlignXplore+——一种能够生成文本化偏好摘要的通用偏好推理模型。在九个基准测试上的实验表明,我们的80亿参数模型取得了最先进的性能表现(显著优于规模更大的开源模型),同时在跨任务、跨模型族和跨交互格式的场景中展现出强大的可迁移性。