Personalized recommender systems fulfill the daily demands of customers and boost online businesses. The goal is to learn a policy that can generate a list of items that matches the user's demand or interest. While most existing methods learn a pointwise scoring model that predicts the ranking score of each individual item, recent research shows that the listwise approach can further improve the recommendation quality by modeling the intra-list correlations of items that are exposed together. This has motivated the recent list reranking and generative recommendation approaches that optimize the overall utility of the entire list. However, it is challenging to explore the combinatorial space of list actions and existing methods that use cross-entropy loss may suffer from low diversity issues. In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality. The proposed solution, GFN4Rec, is a generative method that takes the insight of the flow network to ensure the alignment between list generation probability and its reward. The key advantages of our solution are the log scale reward matching loss that intrinsically improves the generation diversity and the autoregressive item selection model that captures the item mutual influences while capturing future reward of the list. As validation of our method's effectiveness and its superior diversity during active exploration, we conduct experiments on simulated online environments as well as an offline evaluation framework for two real-world datasets.
翻译:个性化推荐系统满足用户的日常需求并促进在线业务增长。其目标是学习一个策略,能够生成与用户需求或兴趣匹配的项目列表。现有方法大多学习点级评分模型来预测每个独立项目的排序分数,而近期研究表明,列表级方法通过建模同时呈现的项目之间的列表内相关性,可以进一步提升推荐质量。这促使了近期列表重排序和生成式推荐方法的发展,这些方法旨在优化整个列表的总体效用。然而,探索列表动作的组合空间具有挑战性,且现有使用交叉熵损失的方法可能面临多样性不足的问题。本研究旨在学习一个策略,能够为用户生成具有足够多样性的项目列表,同时保持高推荐质量。所提出的解决方案GFN4Rec是一种生成式方法,利用流网络的洞见来确保列表生成概率与其奖励之间的对齐。我们方法的核心优势在于:采用对数尺度奖励匹配损失,从本质上提升生成多样性;以及自回归项目选择模型,在捕获项目相互影响的同时,考虑列表的未来奖励。为验证我们方法的有效性及其在主动探索过程中的卓越多样性,我们在模拟在线环境以及两个真实数据集的离线评估框架上开展了实验。