As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. In this paper, we propose CoPe (Contrasting Personal Preference), a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data. Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal. We evaluate CoPe across five open-ended personalized text generation tasks. Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an average of 10.57% in ROUGE-L, without relying on external reward models or additional training procedures.
翻译:随着大语言模型(LLMs)在各类实际应用中的逐步部署,LLMs的个性化定制变得日益重要。尽管基于提示和基于训练等多种个性化方法已被积极探讨,但解码时算法的开发在很大程度上仍被忽视,尽管其已展现出显著潜力。本文提出CoPe(对比个人偏好),一种在用户特定数据上进行参数高效微调(PEFT)后应用的新型解码时方法。我们的核心思想是通过最大化每个用户的隐式奖励信号,专门利用奖励引导解码来实现个性化。我们在五个开放式个性化文本生成任务上评估CoPe。实证结果表明,CoPe在不依赖外部奖励模型或额外训练流程的情况下,取得了优异性能,在ROUGE-L指标上平均提升个性化效果10.57%。