Click-through rate (CTR) prediction is one of the fundamental tasks for online advertising and recommendation. While multi-layer perceptron (MLP) serves as a core component in many deep CTR prediction models, it has been widely recognized that applying a vanilla MLP network alone is inefficient in learning multiplicative feature interactions. As such, many two-stream interaction models (e.g., DeepFM and DCN) have been proposed by integrating an MLP network with another dedicated network for enhanced CTR prediction. As the MLP stream learns feature interactions implicitly, existing research focuses mainly on enhancing explicit feature interactions in the complementary stream. In contrast, our empirical study shows that a well-tuned two-stream MLP model that simply combines two MLPs can even achieve surprisingly good performance, which has never been reported before by existing work. Based on this observation, we further propose feature gating and interaction aggregation layers that can be easily plugged to make an enhanced two-stream MLP model, FinalMLP. In this way, it not only enables differentiated feature inputs but also effectively fuses stream-level interactions across two streams. Our evaluation results on four open benchmark datasets as well as an online A/B test in our industrial system show that FinalMLP achieves better performance than many sophisticated two-stream CTR models. Our source code will be available at MindSpore/models.
翻译:点击率(CTR)预测是在线广告和推荐的核心任务之一。尽管多层感知机(MLP)作为许多深度CTR预测模型的核心组件,但已有研究广泛认识到仅使用普通MLP网络在学习乘法特征交互方面效率不足。为此,许多双流交互模型(例如DeepFM和DCN)被提出,通过将MLP网络与另一个专用网络集成以增强CTR预测。由于MLP流隐式地学习特征交互,现有研究主要集中在增强互补流中的显式特征交互。相比之下,我们的实证研究表明,一个经过良好调优的、简单组合两个MLP的双流MLP模型甚至能取得出人意料的优异性能,这一现象此前未曾被现有工作报道过。基于这一观察,我们进一步提出了可轻松嵌入的特征门控和交互聚合层,从而构建了增强型双流MLP模型FinalMLP。通过这种方式,它不仅实现了差异化的特征输入,还能有效融合两个流之间的流级交互。我们在四个公开基准数据集以及工业系统中的在线A/B测试上的评估结果表明,FinalMLP的性能优于许多复杂的双流CTR模型。我们的源代码将发布于MindSpore/models。