Large language model (LLM)-powered assistants have recently integrated memory mechanisms that record user preferences, leading to more personalized and user-aligned responses. However, irrelevant personalized memories are often introduced into the context, interfering with the LLM's intent understanding. To comprehensively investigate the dual effects of personalization, we develop RPEval, a benchmark comprising a personalized intent reasoning dataset and a multi-granularity evaluation protocol. RPEval reveals the widespread phenomenon of irrational personalization in existing LLMs and, through error pattern analysis, illustrates its negative impact on user experience. Finally, we introduce RP-Reasoner, which treats memory utilization as a pragmatic reasoning process, enabling the selective integration of personalized information. Experimental results demonstrate that our method significantly outperforms carefully designed baselines on RPEval, and resolves 80% of the bad cases observed in a large-scale commercial personalized assistant, highlighting the potential of pragmatic reasoning to mitigate irrational personalization. Our benchmark is publicly available at https://github.com/XueyangFeng/RPEval.
翻译:基于大语言模型(LLM)的智能助手近期集成了记录用户偏好的记忆机制,从而能够生成更具个性化且与用户需求更契合的回应。然而,不相关的个性化记忆常被引入上下文,干扰大语言模型对用户意图的理解。为全面探究个性化带来的双重效应,我们开发了RPEval基准,该基准包含一个个性化意图推理数据集和一个多粒度评估方案。RPEval揭示了现有大语言模型中普遍存在的非理性个性化现象,并通过错误模式分析阐明了其对用户体验的负面影响。最后,我们提出了RP-Reasoner方法,该方法将记忆利用视为一种语用推理过程,从而能够选择性地整合个性化信息。实验结果表明,我们的方法在RPEval基准上显著优于精心设计的基线模型,并在一个大规模商用个性化助手中解决了80%的劣质案例,凸显了语用推理在缓解非理性个性化方面的潜力。我们的基准已公开于 https://github.com/XueyangFeng/RPEval。