Sequential recommendation aims to capture users' dynamic interest and predicts the next item of users' preference. Most sequential recommendation methods use a deep neural network as sequence encoder to generate user and item representations. Existing works mainly center upon designing a stronger sequence encoder. However, few attempts have been made with training an ensemble of networks as sequence encoders, which is more powerful than a single network because an ensemble of parallel networks can yield diverse prediction results and hence better accuracy. In this paper, we present Ensemble Modeling with contrastive Knowledge Distillation for sequential recommendation (EMKD). Our framework adopts multiple parallel networks as an ensemble of sequence encoders and recommends items based on the output distributions of all these networks. To facilitate knowledge transfer between parallel networks, we propose a novel contrastive knowledge distillation approach, which performs knowledge transfer from the representation level via Intra-network Contrastive Learning (ICL) and Cross-network Contrastive Learning (CCL), as well as Knowledge Distillation (KD) from the logits level via minimizing the Kullback-Leibler divergence between the output distributions of the teacher network and the student network. To leverage contextual information, we train the primary masked item prediction task alongside the auxiliary attribute prediction task as a multi-task learning scheme. Extensive experiments on public benchmark datasets show that EMKD achieves a significant improvement compared with the state-of-the-art methods. Besides, we demonstrate that our ensemble method is a generalized approach that can also improve the performance of other sequential recommenders. Our code is available at this link: https://github.com/hw-du/EMKD.
翻译:序列推荐旨在捕捉用户的动态兴趣并预测用户偏好的下一个项目。大多数序列推荐方法使用深度神经网络作为序列编码器来生成用户和项目表示。现有工作主要集中于设计更强的序列编码器。然而,很少尝试训练网络集成作为序列编码器,这比单个网络更强大,因为并行网络的集成可以产生多样化的预测结果,从而提高准确性。本文提出用于序列推荐的集成建模与对比知识蒸馏方法(EMKD)。我们的框架采用多个并行网络作为序列编码器的集成,并基于所有这些网络的输出分布推荐项目。为促进并行网络间的知识传递,我们提出一种新颖的对比知识蒸馏方法,该方法通过帧内对比学习(ICL)和跨网络对比学习(CCL)在表示层面进行知识迁移,同时通过最小化教师网络与学生网络输出分布之间的KL散度,在logits层面进行知识蒸馏(KD)。为利用上下文信息,我们将主掩码项目预测任务与辅助属性预测任务作为多任务学习方案共同训练。在公开基准数据集上的大量实验表明,EMKD相比现有最先进方法取得了显著改进。此外,我们证明该集成方法是一种通用方法,也能提升其他序列推荐器的性能。我们的代码见:https://github.com/hw-du/EMKD。