Current LLM-based conversational recommender systems (CRS) primarily optimize recommendation accuracy and user satisfaction. We identify an underexplored vulnerability in which recommendation outputs may negatively impact users by violating personalized safety constraints, when individualized safety sensitivities -- such as trauma triggers, self-harm history, or phobias -- are implicitly inferred from the conversation but not respected during recommendation. We formalize this challenge as personalized CRS safety and introduce SafeRec, a new benchmark dataset designed to systematically evaluate safety risks in LLM-based CRS under user-specific constraints. To further address this problem, we propose SafeCRS, a safety-aware training framework that integrates Safe Supervised Fine-Tuning (Safe-SFT) with Safe Group reward-Decoupled Normalization Policy Optimization (Safe-GDPO) to jointly optimize recommendation quality and personalized safety alignment. Extensive experiments on SafeRec demonstrate that SafeCRS reduces safety violation rates by up to 96.5% relative to the strongest recommendation-quality baseline while maintaining competitive recommendation quality. Warning: This paper contains potentially harmful and offensive content.
翻译:当前基于大语言模型(LLM)的对话推荐系统(CRS)主要优化推荐准确性和用户满意度。我们发现一个尚未充分探索的脆弱性:当从对话中隐式推断出个体化的安全敏感性(如创伤触发因素、自残史或恐惧症)但在推荐过程中未被尊重时,推荐输出可能因违反个性化安全约束而对用户产生负面影响。我们将这一挑战形式化为个性化CRS安全问题,并引入SafeRec——一个专为系统评估基于LLM的CRS在用户特定约束下安全风险而设计的新基准数据集。为应对此问题,我们提出SafeCRS,这是一个安全感知的训练框架,它将安全监督微调(Safe-SFT)与安全组奖励解耦归一化策略优化(Safe-GDPO)相结合,以联合优化推荐质量与个性化安全对齐。在SafeRec上的大量实验表明,相较于最强的推荐质量基线模型,SafeCRS将安全违规率降低了高达96.5%,同时保持了有竞争力的推荐质量。警告:本文包含可能有害及冒犯性内容。