Data sparsity is an important issue for click-through rate (CTR) prediction, particularly when user-item interactions is too sparse to learn a reliable model. Recently, many works on cross-domain CTR (CDCTR) prediction have been developed in an effort to leverage meaningful data from a related domain. However, most existing CDCTR works have an impractical limitation that requires homogeneous inputs (\textit{i.e.} shared feature fields) across domains, and CDCTR with heterogeneous inputs (\textit{i.e.} varying feature fields) across domains has not been widely explored but is an urgent and important research problem. In this work, we propose a cross-domain augmentation network (CDAnet) being able to perform knowledge transfer between two domains with \textit{heterogeneous inputs}. Specifically, CDAnet contains a designed translation network and an augmentation network which are trained sequentially. The translation network is able to compute features from two domains with heterogeneous inputs separately by designing two independent branches, and then learn meaningful cross-domain knowledge using a designed cross-supervised feature translator. Later the augmentation network encodes the learned cross-domain knowledge via feature translation performed in the latent space and fine-tune the model for final CTR prediction. Through extensive experiments on two public benchmarks and one industrial production dataset, we show CDAnet can learn meaningful translated features and largely improve the performance of CTR prediction. CDAnet has been conducted online A/B test in image2product retrieval at Taobao app over 20days, bringing an absolute \textbf{0.11 point} CTR improvement and a relative \textbf{1.26\%} GMV increase.
翻译:数据稀疏性是点击率(CTR)预测中的一个重要问题,尤其在用户-物品交互过于稀疏而无法学习可靠模型的情况下。近年来,众多跨域CTR(CDCTR)预测研究致力于利用相关领域的有意义数据。然而,现有大多数CDCTR工作存在一个不切实际的限制,即要求各领域具有同质输入(即共享特征域),而具有异质输入(即特征域不同)的跨域CTR预测尚未被广泛探索,却是一个紧迫且重要的研究课题。本文提出一种跨域增强网络(CDAnet),能够实现两个异质输入领域间的知识迁移。具体而言,CDAnet包含一个设计的翻译网络和一个增强网络,两者按序训练。翻译网络通过设计两个独立分支分别计算来自异质输入领域的特征,并利用设计的交叉监督特征翻译器学习有意义的跨域知识;随后增强网络通过在潜在空间中进行特征翻译来编码所学到的跨域知识,并对模型进行微调以完成最终的CTR预测。通过在两个公开基准数据集和一个工业生产数据集上的大量实验,我们证明CDAnet能够学习到有意义的翻译特征,并显著提升CTR预测性能。CDAnet已在淘宝APP的图像到商品检索场景中进行了为期20天的在线A/B测试,带来了绝对**0.11个百分点**的CTR提升和相对**1.26%**的GMV增长。