Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that when personalized LLMs face factual queries, there exists a phenomenon where the model generates answers aligned with a user's prior history rather than the objective truth, resulting in personalization-induced hallucinations that degrade factual reliability and may propagate incorrect beliefs, due to representational entanglement between personalization and factual representations. To address this issue, we propose Factuality-Preserving Personalized Steering (FPPS), a lightweight inference-time approach that mitigates personalization-induced factual distortions while preserving personalized behavior. We further introduce PFQABench, the first benchmark designed to jointly evaluate factual and personalized question answering under personalization. Experiments across multiple LLM backbones and personalization methods show that FPPS substantially improves factual accuracy while maintaining personalized performance.
翻译:个性化大语言模型通过调整模型行为以适应个体用户以提升用户满意度,然而个性化过程可能无意中扭曲事实推理。我们发现,当个性化大语言模型面对事实性查询时,存在一种现象:模型会生成与用户历史偏好相符而非客观事实的答案,从而导致个性化诱导的幻觉,损害事实可靠性并可能传播错误信念,这源于个性化表征与事实表征之间的表征纠缠。为解决此问题,我们提出事实保持个性化引导方法,这是一种轻量级的推理时干预方法,能在保持个性化行为的同时缓解个性化导致的事实扭曲。我们进一步推出PFQABench,这是首个专门用于在个性化设置下联合评估事实性与个性化问答能力的基准测试。在多种大语言模型架构与个性化方法上的实验表明,FPPS在保持个性化性能的同时显著提升了事实准确性。