Personalized recommender systems fulfill the daily demands of customers and boost online businesses. The goal is to learn a policy that can generate a list of items that matches the user's demand or interest. While most existing methods learn a pointwise scoring model that predicts the ranking score of each individual item, recent research shows that the listwise approach can further improve the recommendation quality by modeling the intra-list correlations of items that are exposed together. This has motivated the recent list reranking and generative recommendation approaches that optimize the overall utility of the entire list. However, it is challenging to explore the combinatorial space of list actions and existing methods that use cross-entropy loss may suffer from low diversity issues. In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality. The proposed solution, GFN4Rec, is a generative method that takes the insight of the flow network to ensure the alignment between list generation probability and its reward. The key advantages of our solution are the log scale reward matching loss that intrinsically improves the generation diversity and the autoregressive item selection model that captures the item mutual influences while capturing future reward of the list. As validation of our method's effectiveness and its superior diversity during active exploration, we conduct experiments on simulated online environments as well as an offline evaluation framework for two real-world datasets.
翻译:个性化推荐系统满足客户的日常需求并促进在线业务的发展,其目标是学习一种能够生成与用户需求或兴趣匹配的物品列表的策略。尽管大多数现有方法学习的是预测单个物品排序评分的逐点评分模型,但近期研究表明,通过建模同列展示物品的列表内部相关性,列表化方法可以进一步提升推荐质量。这推动了近年来优化整个列表整体效用的列表重排序和生成式推荐方法的发展。然而,探索列表动作的组合空间具有挑战性,且使用交叉熵损失的现有方法可能面临多样性不足的问题。本文旨在学习一种能够为用户生成具有足够多样性的物品列表,同时保持高推荐质量的策略。提出的方案GFN4Rec是一种生成式方法,利用流网络(flow network)的理念确保列表生成概率与其奖励之间的对齐。该方案的核心优势在于:采用对数尺度的奖励匹配损失从本质上提升生成多样性,以及通过自回归物品选择模型在捕捉列表未来奖励的同时建模物品间的相互影响。为验证方法有效性及其在主动探索场景中的优异多样性表现,我们分别在模拟在线环境和基于两个真实数据集的离线评估框架上进行了实验。