In the recommendation systems, there are multiple business domains to meet the diverse interests and needs of users, and the click-through rate(CTR) of each domain can be quite different, which leads to the demand for CTR prediction modeling for different business domains. The industry solution is to use domain-specific models or transfer learning techniques for each domain. The disadvantage of the former is that the data from other domains is not utilized by a single domain model, while the latter leverage all the data from different domains, but the fine-tuned model of transfer learning may trap the model in a local optimum of the source domain, making it difficult to fit the target domain. Meanwhile, significant differences in data quantity and feature schemas between different domains, known as domain shift, may lead to negative transfer in the process of transferring. To overcome these challenges, we propose the Collaborative Cross-Domain Transfer Learning Framework (CCTL). CCTL evaluates the information gain of the source domain on the target domain using a symmetric companion network and adjusts the information transfer weight of each source domain sample using the information flow network. This approach enables full utilization of other domain data while avoiding negative migration. Additionally, a representation enhancement network is used as an auxiliary task to preserve domain-specific features. Comprehensive experiments on both public and real-world industrial datasets, CCTL achieved SOTA score on offline metrics. At the same time, the CCTL algorithm has been deployed in Meituan, bringing 4.37% CTR and 5.43% GMV lift, which is significant to the business.
翻译:在推荐系统中,存在多个业务领域以满足用户的多样化兴趣和需求,各领域的点击率(CTR)可能存在较大差异,这促使针对不同业务领域进行CTR预测建模的需求。行业解决方案是为每个领域采用领域专用模型或迁移学习技术。前者的缺点在于单个领域模型无法利用其他领域的数据,而后者虽能融合不同领域的所有数据,但迁移学习的微调模型可能使模型陷入源域的局部最优,难以适配目标域。同时,不同领域之间数据量和特征模式的显著差异(即领域偏移)可能导致迁移过程中出现负迁移。为克服这些挑战,我们提出了协同跨领域迁移学习框架(Collaborative Cross-Domain Transfer Learning Framework, CCTL)。CCTL通过对称伴生网络评估源域对目标域的信息增益,并利用信息流网络调整每个源域样本的信息传递权重。该方法既能充分利用其他领域数据,又可避免负迁移。此外,采用表示增强网络作为辅助任务以保留领域特有特征。在公开数据集和实际工业数据集上的综合实验表明,CCTL在离线指标上达到了最先进水平(SOTA)。同时,该算法已在美团部署,带来CTR提升4.37%、GMV提升5.43%的显著业务效果。