We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across all stages of Taobao search advertising system, including retrieval, relevance, ranking, and so on. The performance gains are particularly significant on click-through rate (CTR) prediction task, which achieves an overall +20.00% online CTR improvement. Over the past three years, this project has delivered the largest improvement on CTR prediction task and undergone five full-scale iterations. Throughout the exploration and iteration of our MOON, we have accumulated valuable insights and practical experience that we believe will benefit the research community. MOON contains a three-stage training paradigm of "Pretraining, Post-training, and Application", allowing effective integration of multimodal representations with downstream tasks. Notably, to bridge the misalignment between the objectives of multimodal representation learning and downstream training, we define the exchange rate to quantify how effectively improvements in an intermediate metric can translate into downstream gains. Through this analysis, we identify the image-based search recall as a critical intermediate metric guiding the optimization of multimodal models. Over three years and five iterations, MOON has evolved along four critical dimensions: data processing, training strategy, model architecture, and downstream application. The lessons and insights gained through the iterative improvements will also be shared. As part of our exploration into scaling effects in the e-commerce field, we further conduct a systematic study of the scaling laws governing multimodal representation learning, examining multiple factors such as the number of training tokens, negative samples, and the length of user behavior sequences.


翻译:本文介绍了MOON,这是我们为电子商务应用开发的一套可持续迭代的多模态表示学习实践体系。MOON已全面部署于淘宝搜索广告系统的各个阶段,包括召回、相关性匹配、排序等环节。其在点击率(CTR)预测任务上的性能提升尤为显著,实现了整体在线CTR +20.00%的改进。过去三年间,该项目在CTR预测任务上取得了最大幅度的性能提升,并完成了五次完整迭代。在MOON的探索与迭代过程中,我们积累了宝贵的见解与实践经验,相信这些成果将为研究社区带来启发。MOON包含“预训练、后训练与应用”三阶段训练范式,能够有效整合多模态表示与下游任务。值得注意的是,为弥合多模态表示学习目标与下游训练目标之间的不一致性,我们定义了“兑换率”这一指标,以量化中间指标改进转化为下游收益的有效程度。通过该分析,我们确立了基于图像的搜索召回率作为指导多模态模型优化的关键中间指标。历经三年五次迭代,MOON在数据处理、训练策略、模型架构和下游应用四个关键维度持续演进。通过迭代改进获得的经验与洞察亦将在文中分享。作为对电子商务领域规模效应探索的一部分,我们进一步系统研究了多模态表示学习的缩放规律,考察了训练词元数量、负样本数量以及用户行为序列长度等多重因素。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员