Click-Through Rate (CTR) prediction plays a vital role in recommender systems, online advertising, and search engines. Most of the current approaches model feature interactions through stacked or parallel structures, with some employing knowledge distillation for model compression. However, we observe some limitations with these approaches: (1) In parallel structure models, the explicit and implicit components are executed independently and simultaneously, which leads to insufficient information sharing within the feature set. (2) The introduction of knowledge distillation technology brings about the problems of complex teacher-student framework design and low knowledge transfer efficiency. (3) The dataset and the process of constructing high-order feature interactions contain significant noise, which limits the model's effectiveness. To address these limitations, we propose FSDNet, a CTR prediction framework incorporating a plug-and-play fusion self-distillation module. Specifically, FSDNet forms connections between explicit and implicit feature interactions at each layer, enhancing the sharing of information between different features. The deepest fusion layer is then used as the teacher model, utilizing self-distillation to guide the training of shallow layers. Empirical evaluation across four benchmark datasets validates the framework's efficacy and generalization capabilities. The code is available on https://anonymous.4open.science/r/FSDNet.
翻译:点击率(CTR)预测在推荐系统、在线广告和搜索引擎中起着至关重要的作用。当前大多数方法通过堆叠或并行结构对特征交互进行建模,其中一些采用知识蒸馏进行模型压缩。然而,我们观察到这些方法存在一些局限性:(1)在并行结构模型中,显式和隐式组件独立且同时执行,导致特征集内部信息共享不足。(2)知识蒸馏技术的引入带来了师生框架设计复杂和知识传递效率低下的问题。(3)数据集以及构建高阶特征交互的过程包含显著噪声,这限制了模型的有效性。为了解决这些局限性,我们提出了FSDNet,一种包含即插即用融合自蒸馏模块的CTR预测框架。具体而言,FSDNet在每一层建立显式与隐式特征交互之间的连接,增强了不同特征间的信息共享。随后,最深的融合层被用作教师模型,利用自蒸馏指导浅层的训练。在四个基准数据集上的实证评估验证了该框架的有效性和泛化能力。代码可在 https://anonymous.4open.science/r/FSDNet 获取。