A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
翻译:仅优化推荐以适应用户对已消费物品评分历史的推荐系统,可能形成信息茧房,导致用户无法接触新颖、未探索类别的物品。缓解此类不良行为的一种方法是推荐具有高偶然发现潜力的物品,即令人惊讶且可能获得高评分的物品。本文提出一种基于贝叶斯惊奇的内容导向型偶然发现表述,用于衡量物品被用户消费并评分后的偶然发现程度。当与识别相似用户的协同过滤组件结合时,该方法能够推荐具有高偶然发现潜力的物品。为促进主题级惊奇与偶然发现模型的评估,我们引入一个从Goodreads提取的书籍阅读历史数据集,包含超过2.6万名用户和近130万本书籍,其中针对4名用户阅读的449本书籍,手动标注了其随时间变化的主题级惊奇程度。实验评估表明,与基于距离的启发式方法相比,采用贝叶斯惊奇的模型与主题级惊奇的人工标注相关性显著更高,且在偶然发现物品推荐任务中表现更优。