Recent advances in foundation models have established scaling laws that enable the development of larger models to achieve enhanced performance, motivating extensive research into large-scale recommendation models. However, simply increasing the model size in recommendation systems, even with large amounts of data, does not always result in the expected performance improvements. In this paper, we propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models, each with its own embedding table, to capture unique feature interaction patterns. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning, where models iteratively refine their predictions. To dynamically balance contributions from each model, we introduce a confidence-based fusion mechanism using general softmax, where model confidence is computed via negation entropy. This design ensures that more confident models have a greater influence on the final prediction while benefiting from the complementary strengths of other models. We validate our framework on three public datasets (AmazonElectronics, TaobaoAds, and KuaiVideo) as well as a large-scale industrial dataset from Meta, demonstrating its superior performance over individual models and state-of-the-art baselines. Additionally, we conduct further experiments on the Criteo and Avazu datasets to compare our method with the multi-embedding paradigm. Our results show that our framework achieves comparable or better performance with smaller embedding sizes, offering a scalable and efficient solution for CTR prediction tasks.
翻译:基础模型的最新进展确立了缩放定律,使得开发更大模型以获得增强性能成为可能,这推动了对大规模推荐模型的广泛研究。然而,在推荐系统中,即使拥有大量数据,单纯增加模型规模并不总能带来预期的性能提升。本文提出了一种新颖的框架——协同集成训练网络(CETNet),该框架利用多个具有各自嵌入表的独立模型来捕获独特的特征交互模式。与简单的模型缩放不同,我们的方法通过协同学习强调多样性与协作性,使模型能够迭代优化其预测。为了动态平衡各模型的贡献,我们引入了一种基于置信度的通用softmax融合机制,其中模型置信度通过负熵计算。这一设计确保置信度更高的模型对最终预测具有更大影响力,同时受益于其他模型的互补优势。我们在三个公共数据集(AmazonElectronics、TaobaoAds和KuaiVideo)以及Meta提供的大规模工业数据集上验证了该框架,证明其性能优于单一模型和最先进的基线方法。此外,我们在Criteo和Avazu数据集上进行了进一步实验,以比较我们的方法与多嵌入范式。结果表明,我们的框架在更小的嵌入尺寸下实现了相当或更优的性能,为CTR预测任务提供了一种可扩展且高效的解决方案。