Recent recommender systems have shown remarkable performance by using an ensemble of heterogeneous models. However, it is exceedingly costly because it requires resources and inference latency proportional to the number of models, which remains the bottleneck for production. Our work aims to transfer the ensemble knowledge of heterogeneous teachers to a lightweight student model using knowledge distillation (KD), to reduce the huge inference costs while retaining high accuracy. Through an empirical study, we find that the efficacy of distillation severely drops when transferring knowledge from heterogeneous teachers. Nevertheless, we show that an important signal to ease the difficulty can be obtained from the teacher's training trajectory. This paper proposes a new KD framework, named HetComp, that guides the student model by transferring easy-to-hard sequences of knowledge generated from the teachers' trajectories. To provide guidance according to the student's learning state, HetComp uses dynamic knowledge construction to provide progressively difficult ranking knowledge and adaptive knowledge transfer to gradually transfer finer-grained ranking information. Our comprehensive experiments show that HetComp significantly improves the distillation quality and the generalization of the student model.
翻译:近期推荐系统通过集成异构模型展现出卓越性能。然而,该方法需要与模型数量成比例的资源与推理延迟,成为实际部署的瓶颈。本研究旨在利用知识蒸馏技术,将异构教师模型的集成知识迁移至轻量化学生模型,在保持高精度的同时大幅降低推理成本。通过实证研究发现,异构教师模型间的知识蒸馏效果会显著衰减。但研究表明,教师模型训练轨迹中蕴含的关键信号可缓解这一困难。本文提出名为HetComp的新型知识蒸馏框架,通过迁移教师模型训练轨迹生成的"由易到难"知识序列来指导学生模型。为根据学生学习状态提供适配指导,HetComp采用动态知识构建机制渐进式生成难度递增的排序知识,并通过自适应知识迁移逐步传递更细粒度的排序信息。综合实验表明,HetComp显著提升了蒸馏质量与学生模型的泛化能力。