Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.
翻译:知识蒸馏(Knowledge Distillation,KD)通过将训练完善的大模型(教师模型)的知识迁移至小模型(学生模型),已成为推荐系统实际部署中的重要研究方向。近期,松弛排序蒸馏(Relaxed Ranking Distillation,RRD)表明,蒸馏推荐列表中的排序信息能显著提升模型性能。然而,该方法仍存在以下局限:1)未能充分利用学生模型的预测误差,导致训练效率不足;2)仅蒸馏用户侧的排序信息,在稀疏隐式反馈场景下视角不够全面。本文提出双重修正蒸馏策略(Dual Correction strategy for Distillation,DCD),以更高效的方式将教师模型的排序信息迁移至学生模型。尤为关键的是,DCD利用教师模型与学生模型预测之间的差异,决定需蒸馏的知识类型。通过这一机制,DCD本质上为学生模型提供了针对其预测不准确部分的“修正”式学习指导。该过程通过同时迁移用户侧与物品侧的排序信息,应对稀疏隐式用户反馈问题。实验表明,所提方法优于现有最先进基线模型,消融研究验证了各组成部分的有效性。