Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: how many historical samples are required to learn a good expenditure plan? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of $T\log T$ samples per distribution to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that just one sample per distribution is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.
翻译:主流互联网广告平台将预算调控工具作为广告商管理广告投放活动的标准服务。鉴于广告商自身价值及其竞争对手价值随时间呈现的固有非平稳性,常用方法是学习一个随时刻变化的目标支出计划,并运行跟踪该计划的控制器。由此引发的问题:需要多少历史样本才能学习到良好的支出计划?我们通过考虑广告商重复参与$T$次第二价格拍卖来研究此问题,其中其价值与最高竞争出价的组合来自未知时变分布。广告商在预算约束下力求最大化总效用。先前研究表明,每个分布需要$T\log T$个样本才能实现最优$O(\sqrt{T})$遗憾。我们大幅改进了这一前沿成果,证明每个分布仅需一个样本即可达到近最优的$\tilde O(\sqrt{T})$遗憾,同时保持对采样分布噪声的鲁棒性。