Large Language Models (LLMs) excel at general-purpose tasks, yet adapting their responses to individual users remains challenging. Retrieval augmentation provides a lightweight alternative to fine-tuning by conditioning LLMs on user history records, and existing approaches typically select these records based on semantic relevance. We argue that relevance serves as an unreliable proxy for utility: a record may be semantically similar to a query yet fail to improve generation quality or even degrade it due to redundancy or conflicting information. To bridge this gap, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as a set generation process and utilizes a Plackett-Luce ranking model to capture complex inter-record dependencies. By training with dense feedback provided by the likelihood of the reference response, our method aligns retrieval directly with generation quality. Extensive experiments on nine personalization tasks demonstrate that PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines in both effectiveness and efficiency, establishing a principled and scalable solution for optimizing user profiles.
翻译:大型语言模型(LLM)在通用任务上表现出色,但将其响应适配至个体用户仍具挑战性。检索增强提供了一种轻量化的微调替代方案,通过基于用户历史记录调节LLM,现有方法通常依据语义相关性选择这些记录。我们认为相关性是效用的不可靠代理:一条记录可能在语义上与查询相似,却因冗余或冲突信息而未能提升生成质量甚至使其下降。为弥合这一差距,我们提出PURPLE——一种基于上下文多臂赌博机的框架,用于优化LLM个性化用户档案。相较于贪婪选择最相关记录,PURPLE将档案构建视为集合生成过程,并利用Plackett-Luce排序模型捕捉记录间复杂的相互依赖关系。通过使用参考响应似然提供的密集反馈进行训练,我们的方法将检索直接与生成质量对齐。在九项个性化任务上的大量实验表明,PURPLE在效果和效率上均持续优于强启发式及检索增强基线,为优化用户档案建立了一种原则化且可扩展的解决方案。