Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the user's interest. For example, purchased items should be given more importance than clicked ones. Hence, we propose a general automatic sampling framework, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM augments the standard sequential recommendation architecture with an additional sampler layer to adaptively learn the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To overcome the challenges of non-differentiable sampling actions and also introduce multiple decision factors for sampling, we further introduce a novel reinforcement learning based method to guide the training of the sampler. We theoretically design multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommender models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed approach. We will make our code publicly available after the acceptance.
翻译:序列推荐系统(SRS)因其能够有效捕捉用户动态偏好而在推荐领域得到广泛应用。当前SRS的一个默认设置是统一将每个历史行为视为正交互。实际上,这种设置可能导致次优性能,因为每个物品对用户兴趣的贡献存在差异。例如,已购买物品应比点击物品更具重要性。为此,我们提出名为AutoSAM的通用自动采样框架,以非均匀方式处理历史行为。具体而言,AutoSAM在标准序列推荐架构中增加一个额外的采样器层,自适应学习原始输入的偏态分布,并采样信息子集以构建更具泛化性的SRS。为克服不可微分采样动作的挑战并引入多重采样决策因素,我们进一步提出基于强化学习的新方法指导采样器训练。我们从理论上设计了包含未来预测和序列困惑度的多目标采样奖励,并结合策略梯度实现整个框架的端到端优化。我们在基准推荐模型和四个真实世界数据集上开展大量实验,实验结果证明了所提方法的有效性。论文接收后我们将公开代码。