Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.
翻译:知识蒸馏(KD)通过将训练有素的大型模型(教师模型)的知识迁移至小型模型(学生模型),已成为推荐系统实际部署中的重要研究方向。近期提出的松弛排序蒸馏(RRD)表明,蒸馏推荐列表中的排序信息能显著提升模型性能。然而,该方法仍存在以下局限:1)未能充分利用学生模型的预测误差,导致训练效率不足;2)仅蒸馏用户侧排序信息,在稀疏隐式反馈场景下视图不充分。本文提出双重校正蒸馏策略(DCD),以更高效的方式将教师模型的排序信息迁移至学生模型。关键在于,DCD利用教师模型与学生模型预测间的差异,决定需要蒸馏的知识。通过这一机制,DCD本质上是针对学生模型未能准确预测的部分提供定制化的“校正”学习指导。该过程同时应用于用户侧与物品侧排序信息的迁移,以应对稀疏隐式用户反馈。实验表明,所提方法优于当前最先进基线方法,消融研究验证了各模块的有效性。