Unbiased Filtering Of Accidental Clicks in Verizon Media Native Advertising

Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the challenge of predicting click-through rates (CTR) when we are aware that some of the clicks have short dwell-time and are defined as accidental clicks. An accidental click implies little affinity between the user and the ad, so predicting that similar users will click on the ad is inaccurate. Therefore, it may be beneficial to remove clicks with dwell-time lower than a predefined threshold from the training set. However, we cannot ignore these positive events, as filtering these will cause the model to under predict. Previous approaches have tried to apply filtering and then adding corrective biases to the CTR predictions, but did not yield revenue lifts and therefore were not adopted. In this work, we present a new approach where the positive weight of the accidental clicks is distributed among all of the negative events (skips), based on their likelihood of causing accidental clicks, as predicted by an auxiliary model. These likelihoods are taken as the correct labels of the negative events, shifting our training from using only binary labels and adopting a binary cross-entropy loss function in our training process. After showing offline performance improvements, the modified model was tested online serving VZM native users, and provided 1.18% revenue lift over the production model which is agnostic to accidental clicks.

翻译：Verizon Media（VZM）原生广告是VZM规模最大且增长最快的业务之一，在过去一年中已达到数亿美元的年化收入。驱动VZM原生模型（用于预测点击概率和转化概率等事件概率）的核心算法是OFFSET——一种基于特征增强的协同过滤事件预测算法。本文聚焦于点击率（CTR）预测的挑战，特别是当部分点击具有短停留时间且被定义为意外点击的情况。意外点击意味着用户与广告之间的关联度极低，因此预测相似用户会点击该广告是不准确的。基于此，从训练集中移除停留时间低于预设阈值的点击可能是有益的。然而，我们无法完全忽略这些正事件，因为过滤它们会导致模型预测不足。以往的方法尝试应用过滤后对CTR预测添加校正偏差，但未能带来收入提升，因此未被采用。本文提出了一种新方法，将意外点击的正向权重分布到所有负事件（跳过）中，权重由辅助模型预测的意外点击可能性决定。这些可能性被用作负事件的校正标签，使我们的训练从仅使用二元标签转变为在训练过程中采用二元交叉熵损失函数。在展示离线性能提升后，修改后的模型在服务于VZM原生用户的在线环境中进行了测试，相较于忽略意外点击的生产模型，收入提升了1.18%。