As users increasingly expect LLMs to align with their preferences, personalized information becomes valuable. However, personalized information can be a double-edged sword: it can improve interaction but may compromise objectivity and factual correctness, especially when it is misaligned with the question. To alleviate this problem, we propose PersonaDual, a framework that supports both general-purpose objective reasoning and personalized reasoning in a single model, and adaptively switches modes based on context. PersonaDual is first trained with SFT to learn two reasoning patterns, and then further optimized via reinforcement learning with our proposed DualGRPO to improve mode selection. Experiments on objective and personalized benchmarks show that PersonaDual preserves the benefits of personalization while reducing interference, achieving near interference-free performance and better leveraging helpful personalized signals to improve objective problem-solving.
翻译:随着用户越来越期望大型语言模型(LLM)与其偏好保持一致,个性化信息变得愈发重要。然而,个性化信息是一把双刃剑:它虽能改善交互体验,但也可能损害客观性与事实正确性,尤其当其与问题本身不匹配时。为缓解此问题,我们提出了PersonaDual框架,该框架在单一模型中同时支持通用客观推理与个性化推理,并能根据上下文自适应切换模式。PersonaDual首先通过监督微调(SFT)学习两种推理模式,随后利用我们提出的DualGRPO通过强化学习进一步优化,以改进模式选择。在客观性与个性化基准测试上的实验表明,PersonaDual在保留个性化优势的同时减少了其干扰,实现了近乎无干扰的性能,并能更好地利用有益的个性化信号来提升客观问题解决能力。