Click-through rate (CTR) prediction is a vital task in industry advertising systems. Most existing methods focus on the structure design of neural network for better accuracy and suffer from the data sparsity problem. Especially in industry advertising systems, the widely applied negative sample downsampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose \textbf{A}uxiliary Match \textbf{T}asks for enhancing \textbf{C}lick-\textbf{T}hrough \textbf{R}ate performance (AT4CTR) to alleviate the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance between user and item. As the "click" action is a strong signal which indicates user's preference towards item directly, we make the first match task aim at pulling closer the representation between user and item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE in contrastive learning as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on a public dataset and a large-scale industry advertising dataset. The results demonstrate the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industry advertising system and gains remarkable revenue.
翻译:点击率(CTR)预测是工业广告系统中的一项关键任务。现有方法大多侧重于神经网络结构设计以提升精度,但普遍面临数据稀疏问题。特别是在工业广告系统中,因资源限制而广泛采用的负样本下采样技术进一步加剧了该问题,导致性能下降。本文提出**辅助匹配任务增强点击率预测**(AT4CTR),以缓解数据稀疏问题。具体而言,受协同过滤启发,我们设计了两项匹配任务来增强用户与物品之间的相关性。由于“点击”行为是直接表征用户偏好的强信号,第一个匹配任务旨在拉近正样本中用户与物品的表示距离。鉴于用户历史点击行为可视为其自身特征,我们将下一物品预测作为第二个匹配任务。两项匹配任务均采用对比学习中的InfoNCE作为损失函数。这些辅助匹配任务可提供有意义的训练信号,加速模型收敛并缓解数据稀疏问题。我们在公开数据集和大规模工业广告数据集上进行了充分实验,结果证明了所提辅助匹配任务的有效性。AT4CTR已在真实工业广告系统中部署,并获得了显著的收益增长。