Click-through rate (CTR) prediction is widely used in academia and industry. Most CTR tasks fall into a feature embedding \& feature interaction paradigm, where the accuracy of CTR prediction is mainly improved by designing practical feature interaction structures. However, recent studies have argued that the fixed feature embedding learned only through the embedding layer limits the performance of existing CTR models. Some works apply extra modules on top of the embedding layer to dynamically refine feature representations in different instances, making it effective and easy to integrate with existing CTR methods. Despite the promising results, there is a lack of a systematic review and summarization of this new promising direction on the CTR task. To fill this gap, we comprehensively summarize and define a new module, namely \textbf{feature refinement} (FR) module, that can be applied between feature embedding and interaction layers. We extract 14 FR modules from previous works, including instances where the FR module was proposed but not clearly defined or explained. We fully assess the effectiveness and compatibility of existing FR modules through comprehensive and extensive experiments with over 200 augmented models and over 4,000 runs for more than 15,000 GPU hours. The results offer insightful guidelines for researchers, and all benchmarking code and experimental results are open-sourced. In addition, we present a new architecture of assigning independent FR modules to separate sub-networks for parallel CTR models, as opposed to the conventional method of inserting a shared FR module on top of the embedding layer. Our approach is also supported by comprehensive experiments demonstrating its effectiveness.
翻译:点击率预测在学术界和工业界应用广泛。大多数点击率任务采用特征嵌入与特征交互范式,主要通过设计实用的特征交互结构提升预测精度。然而,近期研究表明仅通过嵌入层学习的固定特征表示限制了现有点击率模型的性能。部分研究在嵌入层之上附加额外模块,根据不同实例动态精炼特征表示,该方法不仅有效且易于与现有点击率方法集成。尽管取得了显著成果,但该新兴方向在点击率任务中仍缺乏系统性回顾与总结。为填补这一空白,我们对可在特征嵌入层与交互层之间应用的新型模块——即**特征精炼**模块进行了全面总结与定义。我们从既往工作中提取了14个特征精炼模块,包括虽被提出但未明确定义或阐述的实例。通过超过200个增强模型、4000余次实验(累计15000余GPU小时),我们充分评估了现有特征精炼模块的有效性与兼容性。研究结果为研究人员提供了具有洞察力的指导,所有基准测试代码及实验结果均已开源。此外,我们提出一种新架构:为并行点击率模型的独立子网络分别分配独立的特征精炼模块,而非传统方法中在嵌入层之上插入共享特征精炼模块。该方案同样通过全面实验验证了其有效性。