A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
翻译:仅根据用户对已消费物品的评分历史来优化推荐的推荐系统可能会造成过滤气泡,导致用户无法接触新颖、未见过类别的物品。缓解这一不良现象的方法之一是推荐具有高意外发现潜力的物品,即那些令人惊讶且可能获得高评分的物品。本文提出一种基于贝叶斯惊奇的内容驱动型意外发现形式化方法,并用于衡量物品被用户消费和评分后的意外发现程度。当与识别相似用户的协同过滤组件结合时,该方法能够推荐具有高意外发现潜力的物品。为便于评估主题级惊奇与意外发现模型,我们引入一个从Goodreads提取的书籍阅读历史数据集,包含超过2.6万名用户和近130万本书籍,并对其中4名用户阅读的449本书籍进行了基于时间依赖的主题级惊奇人工标注。实验评估表明,与基于距离的启发式方法相比,采用贝叶斯惊奇的模型与主题级惊奇人工标注的相关性显著更高,且在意外物品推荐任务中表现更优。