MTRec: Learning to Align with User Preferences via Mental Reward Models

Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.

翻译：推荐模型主要依赖隐式用户反馈进行训练，因为显式反馈的获取成本通常较高。然而，隐式反馈（例如点击）并不总能反映用户的真实偏好。例如，用户可能因为吸引人的标题而点击一篇新闻文章，但在阅读内容后感到不适。在缺乏显式反馈的情况下，此类错误的隐式信号可能会严重误导推荐系统。本文提出MTRec，一种新颖的序列推荐框架，旨在通过揭示用户对推荐项目的内在满意度来与其真实偏好对齐。具体而言，我们引入一个心理奖励模型来量化用户满意度，并提出一种分布逆强化学习方法来学习该模型。学习到的心理奖励模型随后被用于指导推荐模型，以更好地与用户的真实偏好对齐。实验表明，MTRec为多种推荐模型带来了显著改进。我们还在一个工业短视频平台上部署了MTRec，观察到用户平均观看时间增加了7%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/