Click-through rate (CTR) prediction is a crucial area of research in online advertising. While binary cross entropy (BCE) has been widely used as the optimization objective for treating CTR prediction as a binary classification problem, recent advancements have shown that combining BCE loss with an auxiliary ranking loss can significantly improve performance. However, the full effectiveness of this combination loss is not yet fully understood. In this paper, we uncover a new challenge associated with the BCE loss in scenarios where positive feedback is sparse: the issue of gradient vanishing for negative samples. We introduce a novel perspective on the effectiveness of the auxiliary ranking loss in CTR prediction: it generates larger gradients on negative samples, thereby mitigating the optimization difficulties when using the BCE loss only and resulting in improved classification ability. To validate our perspective, we conduct theoretical analysis and extensive empirical evaluations on public datasets. Additionally, we successfully integrate the ranking loss into Tencent's online advertising system, achieving notable lifts of 0.70% and 1.26% in Gross Merchandise Value (GMV) for two main scenarios. The code is openly accessible at: https://github.com/SkylerLinn/Understanding-the-Ranking-Loss.
翻译:点击率(CTR)预测是在线广告领域的关键研究方向。虽然二元交叉熵(BCE)作为将CTR预测视为二元分类问题的优化目标已被广泛使用,但近期研究表明,将BCE损失与辅助排序损失结合能显著提升性能。然而,这种组合损失的全部效能尚未得到充分理解。本文揭示了在正反馈稀疏场景下与BCE损失相关的新挑战:负样本的梯度消失问题。我们针对辅助排序损失在CTR预测中的有效性提出了新颖见解:该损失能为负样本生成更大的梯度,从而缓解仅使用BCE损失时的优化困难,最终提升分类能力。为验证这一观点,我们在公开数据集上进行了理论分析和大量实证评估。此外,我们成功将排序损失集成至腾讯在线广告系统,在两个主要场景中实现了商品交易总额(GMV)0.70%和1.26%的显著提升。相关代码已公开于:https://github.com/SkylerLinn/Understanding-the-Ranking-Loss。