Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation

Cross-domain CTR (CDCTR) prediction is an important research topic that studies how to leverage meaningful data from a related domain to help CTR prediction in target domain. Most existing CDCTR works design implicit ways to transfer knowledge across domains such as parameter-sharing that regularizes the model training in target domain. More effectively, recent researchers propose explicit techniques to extract user interest knowledge and transfer this knowledge to target domain. However, the proposed method mainly faces two issues: 1) it usually requires a super domain, i.e. an extremely large source domain, to cover most users or items of target domain, and 2) the extracted user interest knowledge is static no matter what the context is in target domain. These limitations motivate us to develop a more flexible and efficient technique to explicitly transfer knowledge. In this work, we propose a cross-domain augmentation network (CDAnet) being able to perform explicit knowledge transfer between two domains. Specifically, CDAnet contains a designed translation network and an augmentation network which are trained sequentially. The translation network computes latent features from two domains and learns meaningful cross-domain knowledge of each input in target domain by using a designed cross-supervised feature translator. Later the augmentation network employs the explicit cross-domain knowledge as augmented information to boost the target domain CTR prediction. Through extensive experiments on two public benchmarks and one industrial production dataset, we show CDAnet can learn meaningful translated features and largely improve the performance of CTR prediction. CDAnet has been conducted online A/B test in image2product retrieval at Taobao app, bringing an absolute 0.11 point CTR improvement, a relative 0.64% deal growth and a relative 1.26% GMV increase.

翻译：跨域点击率预测（CDCTR）是一个重要研究方向，旨在利用相关领域的有意义数据辅助目标域的点击率预测。现有CDCTR工作大多采用隐式跨域知识迁移方法（如参数共享机制）来规范目标域模型训练。近期研究者提出显式技术提取用户兴趣知识并将其迁移至目标域，但该方法主要存在两个问题：1）通常需要超大规模源域（即极其庞大的源域）来覆盖目标域中大多数用户或物品；2）提取的用户兴趣知识是静态的，无法适应目标域的不同上下文。这些局限性促使我们开发更灵活高效的显式知识迁移技术。本文提出跨域增强网络（CDAnet），该网络能够在两个域之间执行显式知识迁移。具体而言，CDAnet包含依次训练的设计翻译网络和增强网络：翻译网络通过设计交叉监督特征翻译器，从两个域中计算潜在特征并学习每个目标域输入的有意义跨域知识；增强网络则将显式跨域知识作为增强信息，提升目标域点击率预测性能。在两个公开基准数据集与一个工业生产数据集上的大量实验表明，CDAnet能够学习有意义的翻译特征，显著提升点击率预测性能。该模型已在淘宝应用"以图搜商品"场景完成线上A/B测试，实现0.11个百分点的绝对点击率提升、0.64%的相对交易增长以及1.26%的相对GMV增量。