Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation

Cross-domain CTR (CDCTR) prediction is an important research topic that studies how to leverage meaningful data from a related domain to help CTR prediction in target domain. Most existing CDCTR works design implicit ways to transfer knowledge across domains such as parameter-sharing that regularizes the model training in target domain. More effectively, recent researchers propose explicit techniques to extract user interest knowledge and transfer this knowledge to target domain. However, the proposed method mainly faces two issues: 1) it usually requires a super domain, i.e. an extremely large source domain, to cover most users or items of target domain, and 2) the extracted user interest knowledge is static no matter what the context is in target domain. These limitations motivate us to develop a more flexible and efficient technique to explicitly transfer knowledge. In this work, we propose a cross-domain augmentation network (CDAnet) being able to perform explicit knowledge transfer between two domains. Specifically, CDAnet contains a designed translation network and an augmentation network which are trained sequentially. The translation network computes latent features from two domains and learns meaningful cross-domain knowledge of each input in target domain by using a designed cross-supervised feature translator. Later the augmentation network employs the explicit cross-domain knowledge as augmented information to boost the target domain CTR prediction. Through extensive experiments on two public benchmarks and one industrial production dataset, we show CDAnet can learn meaningful translated features and largely improve the performance of CTR prediction. CDAnet has been conducted online A/B test in image2product retrieval at Taobao app, bringing an absolute 0.11 point CTR improvement, a relative 0.64% deal growth and a relative 1.26% GMV increase.

翻译：跨域CTR（CDCTR）预测是一项重要研究课题，旨在利用相关域中的有效数据提升目标域的CTR预测性能。现有CDCTR工作主要采用隐式知识迁移方式（如参数共享）来正则化目标域模型训练。近期研究者提出显式技术提取用户兴趣知识并将其迁移至目标域，但该方法主要面临两个问题：1）通常需要超级域（即规模极大的源域）来覆盖目标域中大多数用户或商品；2）提取的用户兴趣知识与目标域上下文无关。这些局限性促使我们发展更灵活高效的显式知识迁移技术。本文提出跨域增强网络（CDAnet），能够实现两个领域间的显式知识迁移。具体而言，CDAnet包含串联训练的翻译网络与增强网络：翻译网络通过设计交叉监督特征翻译器，从两个域中计算隐特征并学习目标域各输入的有意义跨域知识；增强网络则利用显式跨域知识作为增强信息提升目标域CTR预测。在两个公开基准数据集与工业数据集上的大量实验表明，CDAnet能够学习有意义的翻译特征，显著提升CTR预测性能。该模型已在淘宝APP的图像到商品检索场景完成在线A/B测试，带来CTR绝对提升0.11个百分点，交易量相对增长0.64%，GMV相对增长1.26%。