Sequential recommendation aims to capture users' dynamic interest and predicts the next item of users' preference. Most sequential recommendation methods use a deep neural network as sequence encoder to generate user and item representations. Existing works mainly center upon designing a stronger sequence encoder. However, few attempts have been made with training an ensemble of networks as sequence encoders, which is more powerful than a single network because an ensemble of parallel networks can yield diverse prediction results and hence better accuracy. In this paper, we present Ensemble Modeling with contrastive Knowledge Distillation for sequential recommendation (EMKD). Our framework adopts multiple parallel networks as an ensemble of sequence encoders and recommends items based on the output distributions of all these networks. To facilitate knowledge transfer between parallel networks, we propose a novel contrastive knowledge distillation approach, which performs knowledge transfer from the representation level via Intra-network Contrastive Learning (ICL) and Cross-network Contrastive Learning (CCL), as well as Knowledge Distillation (KD) from the logits level via minimizing the Kullback-Leibler divergence between the output distributions of the teacher network and the student network. To leverage contextual information, we train the primary masked item prediction task alongside the auxiliary attribute prediction task as a multi-task learning scheme. Extensive experiments on public benchmark datasets show that EMKD achieves a significant improvement compared with the state-of-the-art methods. Besides, we demonstrate that our ensemble method is a generalized approach that can also improve the performance of other sequential recommenders. Our code is available at this link: https://github.com/hw-du/EMKD.
翻译:序列推荐旨在捕捉用户的动态兴趣,并预测用户偏好的下一项物品。大多数序列推荐方法使用深度神经网络作为序列编码器来生成用户和物品表示。现有工作主要集中于设计更强的序列编码器。然而,鲜有研究尝试训练多个网络组成的集成模型作为序列编码器,由于并行网络的集成能够产生多样化的预测结果,因此其性能优于单一网络。本文提出面向序列推荐的集成建模与对比知识蒸馏方法(EMKD)。我们的框架采用多个并行网络作为序列编码器的集成,并基于所有网络的输出分布进行物品推荐。为促进并行网络间的知识迁移,我们提出一种新颖的对比知识蒸馏方法,该方法通过内网络对比学习(ICL)和跨网络对比学习(CCL)在表示层面进行知识迁移,同时通过最小化教师网络与学生网络输出分布之间的KL散度,在逻辑值层面进行知识蒸馏(KD)。为利用上下文信息,我们将主要掩码物品预测任务与辅助属性预测任务作为多任务学习方案进行联合训练。在公开基准数据集上的大量实验表明,EMKD相比现有最优方法取得了显著提升。此外,我们证明了所提出的集成方法是一种通用方法,能够提升其他序列推荐器的性能。代码已开源:https://github.com/hw-du/EMKD。