Large language models (LLMs) excel at general-purpose tasks, yet adapting their responses to individual users remains challenging. Retrieval augmentation provides a lightweight alternative to fine-tuning by conditioning LLMs on user history records, and existing approaches typically select these records based on semantic relevance. We argue that relevance serves as an unreliable proxy for utility: a record may be semantically similar to a query yet fail to improve generation quality or even degrade it due to redundancy or conflicting information. To bridge this gap, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for LLM pErsonalization. In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as an order-sensitive generation process and utilizes a Plackett-Luce ranking model to capture complex inter-record dependencies. By training with semantically rich feedback provided by the likelihood of the reference response, our method aligns retrieval directly with generation quality. Extensive experiments on nine personalization tasks demonstrate that PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines in both effectiveness and efficiency, establishing a principled and scalable solution for optimizing user profiles.
翻译:大语言模型(LLMs)在通用任务中表现出色,但将其响应适配至个体用户仍具挑战。检索增强通过对用户历史记录进行条件化建模,为微调提供了一种轻量级替代方案,现有方法通常基于语义相关性选择这些记录。我们认为,相关性作为效用的代理指标并不可靠:一条记录可能在语义上与查询相似,却因冗余或冲突信息而无法提升生成质量,甚至可能降低质量。为弥合这一差距,我们提出PURPLE,一种用于LLM个性化中优化用户画像的上下文赌博机框架。与贪婪选择最相关记录不同,PURBLE将画像构建视为对顺序敏感的生成过程,并利用Plackett-Luce排序模型捕获记录间的复杂依赖关系。通过使用参考响应似然提供的语义丰富反馈进行训练,我们的方法将检索直接与生成质量对齐。在九项个性化任务上的大量实验表明,PURPLE在有效性和效率上均持续优于强启发式方法和检索增强基线,为优化用户画像建立了有原则且可扩展的解决方案。