Personalized recommender systems fulfill the daily demands of customers and boost online businesses. The goal is to learn a policy that can generate a list of items that matches the user's demand or interest. While most existing methods learn a pointwise scoring model that predicts the ranking score of each individual item, recent research shows that the listwise approach can further improve the recommendation quality by modeling the intra-list correlations of items that are exposed together. This has motivated the recent list reranking and generative recommendation approaches that optimize the overall utility of the entire list. However, it is challenging to explore the combinatorial space of list actions and existing methods that use cross-entropy loss may suffer from low diversity issues. In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality. The proposed solution, GFN4Rec, is a generative method that takes the insight of the flow network to ensure the alignment between list generation probability and its reward. The key advantages of our solution are the log scale reward matching loss that intrinsically improves the generation diversity and the autoregressive item selection model that captures the item mutual influences while capturing future reward of the list. As validation of our method's effectiveness and its superior diversity during active exploration, we conduct experiments on simulated online environments as well as an offline evaluation framework for two real-world datasets.
翻译:个性化推荐系统满足用户的日常需求并促进在线业务增长,其核心目标是学习一个能够生成与用户需求或兴趣匹配的推荐列表策略。现有方法大多通过逐点评分模型预测每个独立项目的排序得分,而近期研究表明,通过建模曝光列表中项目间的关联性,列表级方法可进一步提升推荐质量。这催生了近期针对列表重排序与生成式推荐方法的研究,这些方法致力于优化整个列表的整体效用。然而,探索列表动作的组合空间具有挑战性,且现有采用交叉熵损失的方法可能存在多样性不足的问题。本研究旨在学习一个既能向用户生成足够多样化的项目列表,又能保持高推荐质量的策略。所提出的GFN4Rec方案是一种生成式方法,它借鉴流网络思想确保列表生成概率与其奖励之间的对齐。该方法的核心优势在于:采用对数尺度奖励匹配损失从根本上提升生成多样性,并利用自回归项目选择模型在捕获项目间相互影响的同时预测列表的未来奖励。为验证本方法的有效性及其在主动探索过程中卓越的多样性表现,我们在模拟在线环境以及基于两个真实世界数据集的离线评估框架上进行了实验。