Problem definition: Most of the display advertising inventory is sold through real-time auctions. The participants of these auctions are typically bidders (Google, Criteo, RTB House, Trade Desk for instance) who participate on behalf of advertisers. In order to estimate the value of each display opportunity, they usually train advanced machine learning algorithms using historical data. In the labeled training set, the inputs are vectors of features representing each display opportunity and the labels are the generated rewards. In practice, the rewards are given by the advertiser and are tied to whether or not a particular user converts. Consequently, the rewards are aggregated at the user level and never observed at the display level. A fundamental task that has, to the best of our knowledge, been overlooked is to account for this mismatch and split, or attribute, the rewards at the right granularity level before training a learning algorithm. We call this the label attribution problem. Methodology/results: In this paper, we develop an approach to the label attribution problem, which is both theoretically justified and practical. In particular, we develop a fixed point algorithm that allows for large scale implementation and showcase our solution using a large scale publicly available dataset from Criteo, a large Demand Side Platform. We dub our approach the Fixed Point Label Attribution (FiPLA) Algorithm. Managerial implications: There is often a hidden leap of faith when transforming the advertiser's signal into display labelling. DSP providers should be careful when building their machine learning pipeline and carefully solve the label attribution step.
翻译:问题定义:展示广告库存大多通过实时拍卖出售。拍卖参与者通常是代表广告主参与的竞价方(例如Google、Criteo、RTB House、Trade Desk)。为评估每次展示机会的价值,他们通常利用历史数据训练先进的机器学习算法。在标注训练集中,输入是代表每次展示机会的特征向量,而标签则是生成的奖励。实践中,奖励由广告主提供,并与特定用户是否转化相关联。因此,奖励在用户层面聚合,而从未在展示层面被观测到。据我们所知,一个被忽视的基本任务是:在训练学习算法前,需解决这种不匹配问题,并将奖励按正确粒度拆分(即归因)。我们将其称为标签归因问题。方法论/结果:本文提出一种兼具理论依据和实践可行性的标签归因方法。具体而言,我们开发了一种固定点算法,可实现大规模部署,并利用Criteo(大型需求方平台)公开的大规模数据集验证方案。我们将该方法命名为固定点标签归因算法。管理启示:将广告主的信号转化为展示标注时,常存在隐含的盲目假设。需求方平台在构建机器学习流水线时应谨慎处理标签归因步骤。