Understanding what users like is relatively straightforward; understanding what users dislike, however, remains a challenging and underexplored problem. Research into users' negative preferences has gained increasing importance in modern recommendation systems. Numerous platforms have introduced explicit negative feedback mechanisms and leverage such signals to refine their recommendation models. Beyond traditional business metrics, user experience-driven metrics, such as negative feedback rates, have become critical indicators for evaluating system performance. However, most existing approaches primarily use negative feedback as an auxiliary signal to enhance positive recommendations, paying little attention to directly modeling negative interests, which can be highly valuable in offline applications. Moreover, due to the inherent sparsity of negative feedback data, models often suffer from context understanding biases induced by positive feedback dominance. To address these challenges, we propose the first large language model framework for negative feedback modeling with special designed context-discerning modules. We use semantic ID Representation to replace text-based item descriptions and introduce an item-level alignment task that enhances the LLM's understanding of the semantic context behind negative feedback. Furthermore, we design a Progressive GRPO training paradigm that enables the model to dynamically balance the positive and negative behavioral context utilization. Besides, our investigation further reveals a fundamental misalignment between the conventional next-negative-item prediction objective and users' true negative preferences, which is heavily influenced by the system's recommendation order. To mitigate this, we propose a novel reward function and evaluation metric grounded in multi-day future negative feedback and their collaborative signals.
翻译:理解用户的喜好相对直接;然而,理解用户的不喜好仍然是一个具有挑战性且尚未充分探索的问题。在现代推荐系统中,对用户负向偏好的研究日益重要。众多平台已引入显式负反馈机制,并利用此类信号优化其推荐模型。除了传统的业务指标外,用户体验驱动的指标(如负反馈率)已成为评估系统性能的关键指标。然而,现有方法大多仅将负反馈作为增强正向推荐的辅助信号,很少关注直接建模负向兴趣——这在离线应用中具有重要价值。此外,由于负反馈数据固有的稀疏性,模型常受到由正向反馈主导引发的上下文理解偏差。为应对这些挑战,我们提出了首个用于负反馈建模的大语言模型框架,并特别设计了上下文判别模块。我们使用语义ID表示替代基于文本的物品描述,并引入物品级对齐任务,以增强大语言模型对负反馈背后语义上下文的理解。此外,我们设计了一种渐进式GRPO训练范式,使模型能够动态平衡正向与负向行为上下文的利用。进一步研究发现,传统的下一负向物品预测目标与用户真实的负向偏好之间存在根本性错位,这很大程度上受系统推荐顺序的影响。为缓解此问题,我们提出了一种基于多日未来负反馈及其协同信号的新型奖励函数与评估指标。