Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the user's interest. For example, purchased items should be given more importance than clicked ones. Hence, we propose a general automatic sampling framework, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM augments the standard sequential recommendation architecture with an additional sampler layer to adaptively learn the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To overcome the challenges of non-differentiable sampling actions and also introduce multiple decision factors for sampling, we further introduce a novel reinforcement learning based method to guide the training of the sampler. We theoretically design multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommender models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed approach. We will make our code publicly available after the acceptance.
翻译:序列推荐系统(SRS)因其能够有效捕捉动态用户偏好而在推荐领域广受欢迎。当前SRS的一个默认设定是将每个历史行为统一视为正向交互。实际上,这种设定可能导致次优性能,因为每个项目对用户兴趣的贡献程度不同。例如,已购买项目应比仅点击项目获得更高权重。为此,我们提出了一种通用的自动采样框架AutoSAM,以非均匀方式处理历史行为。具体而言,AutoSAM在标准序列推荐架构基础上增加了一个采样器层,用于自适应学习原始输入的偏态分布,进而采样信息丰富的子集以构建泛化能力更强的SRS。为克服不可微分采样操作的挑战并引入多重决策因素,我们进一步提出了一种基于强化学习的方法来指导采样器训练。我们在理论上设计了包含未来预测与序列困惑度的多目标采样奖励机制,并通过结合策略梯度以端到端方式优化整个框架。我们在基准推荐模型和四个真实数据集上进行了大量实验,结果验证了所提方法的有效性。论文录用后我们将公开代码。