Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.
翻译:推荐模型主要依赖隐式用户反馈进行训练,因为显式反馈的获取成本通常较高。然而,隐式反馈(例如点击)并不总能反映用户的真实偏好。例如,用户可能因为吸引人的标题而点击一篇新闻文章,但在阅读内容后感到不适。在缺乏显式反馈的情况下,此类错误的隐式信号可能会严重误导推荐系统。本文提出MTRec,一种新颖的序列推荐框架,旨在通过揭示用户对推荐项目的内在满意度来与其真实偏好对齐。具体而言,我们引入一个心理奖励模型来量化用户满意度,并提出一种分布逆强化学习方法来学习该模型。学习到的心理奖励模型随后被用于指导推荐模型,以更好地与用户的真实偏好对齐。实验表明,MTRec为多种推荐模型带来了显著改进。我们还在一个工业短视频平台上部署了MTRec,观察到用户平均观看时间增加了7%。