Data sparsity is an important issue for click-through rate (CTR) prediction, particularly when user-item interactions is too sparse to learn a reliable model. Recently, many works on cross-domain CTR (CDCTR) prediction have been developed in an effort to leverage meaningful data from a related domain. However, most existing CDCTR works have an impractical limitation that requires homogeneous inputs (\textit{i.e.} shared feature fields) across domains, and CDCTR with heterogeneous inputs (\textit{i.e.} varying feature fields) across domains has not been widely explored but is an urgent and important research problem. In this work, we propose a cross-domain augmentation network (CDAnet) being able to perform knowledge transfer between two domains with \textit{heterogeneous inputs}. Specifically, CDAnet contains a designed translation network and an augmentation network which are trained sequentially. The translation network is able to compute features from two domains with heterogeneous inputs separately by designing two independent branches, and then learn meaningful cross-domain knowledge using a designed cross-supervised feature translator. Later the augmentation network encodes the learned cross-domain knowledge via feature translation performed in the latent space and fine-tune the model for final CTR prediction. Through extensive experiments on two public benchmarks and one industrial production dataset, we show CDAnet can learn meaningful translated features and largely improve the performance of CTR prediction. CDAnet has been conducted online A/B test in image2product retrieval at Taobao app over 20days, bringing an absolute \textbf{0.11 point} CTR improvement and a relative \textbf{1.32\%} GMV increase.
翻译:数据稀疏性是点击率(CTR)预测中的一个重要问题,尤其是在用户-物品交互过于稀疏而无法学习可靠模型的情况下。近期,许多跨域CTR(CDCTR)预测研究致力于利用相关领域的有意义数据。然而,现有大多数CDCTR工作存在一个不切实际的限制,即要求跨域具有同质输入(即共享特征域),而跨域异质输入(即不同特征域)的CDCTR尚未被广泛探索,但这是一个紧迫且重要的研究问题。本文提出了一种跨域增强网络(CDAnet),能够在两个具有异质输入的域之间实现知识迁移。具体而言,CDAnet包含一个设计的翻译网络和一个增强网络,两者按顺序进行训练。翻译网络通过设计两个独立分支,分别计算两个域在异质输入下的特征,然后利用设计的跨监督特征翻译器学习有意义的跨域知识。随后,增强网络通过在隐空间中执行特征翻译来编码学到的跨域知识,并微调模型以进行最终的CTR预测。通过在两个公开基准数据集和一个工业生产数据集上的大量实验,我们证明CDAnet能够学习有意义的翻译特征,并显著提升CTR预测性能。CDAnet已在淘宝应用中的图像到物品检索场景上进行了为期20天的在线A/B测试,带来了**0.11个点**的绝对CTR提升和**1.32%**的相对GMV增长。