Personalized recommender systems fulfill the daily demands of customers and boost online businesses. The goal is to learn a policy that can generate a list of items that matches the user's demand or interest. While most existing methods learn a pointwise scoring model that predicts the ranking score of each individual item, recent research shows that the listwise approach can further improve the recommendation quality by modeling the intra-list correlations of items that are exposed together. This has motivated the recent list reranking and generative recommendation approaches that optimize the overall utility of the entire list. However, it is challenging to explore the combinatorial space of list actions and existing methods that use cross-entropy loss may suffer from low diversity issues. In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality. The proposed solution, GFN4Rec, is a generative method that takes the insight of the flow network to ensure the alignment between list generation probability and its reward. The key advantages of our solution are the log scale reward matching loss that intrinsically improves the generation diversity and the autoregressive item selection model that captures the item mutual influences while capturing future reward of the list. As validation of our method's effectiveness and its superior diversity during active exploration, we conduct experiments on simulated online environments as well as an offline evaluation framework for two real-world datasets.
翻译:个性化推荐系统满足客户的日常需求并推动在线业务增长。其目标在于学习一种策略,能够生成与用户需求或兴趣相匹配的推荐项列表。尽管现有多数方法学习逐点评分模型来预测每个单独项目的排序得分,但近期研究表明,通过建模共同呈现项目之间的列表内相关性,列表级方法能进一步提升推荐质量。这推动了近期列表重排序和生成式推荐方法的发展,这些方法致力于优化整个列表的总体效用。然而,探索列表行为的组合空间具有挑战性,而使用交叉熵损失的现有方法可能面临多样性不足的问题。本文旨在学习一种策略,在保持高推荐质量的同时,为用户生成具有充分多样性的项目列表。我们提出的解决方案GFN4Rec是一种生成式方法,它借鉴流网络的思想,确保列表生成概率与其奖励的对齐。该方法的核心优势在于:采用对数尺度奖励匹配损失从根本上提升了生成多样性,并利用自回归项目选择模型在捕获项目间相互影响的同时,感知列表的未来奖励。为验证本方法的有效性及其在主动探索过程中的卓越多样性,我们在模拟在线环境以及基于两个真实数据集的离线评估框架上开展了实验。