We study how a budget-constrained bidder should learn to adaptively bid in repeated first-price auctions to maximize her cumulative payoff. This problem arose due to an industry-wide shift from second-price auctions to first-price auctions in display advertising recently, which renders truthful bidding (i.e., always bidding one's private value) no longer optimal. We propose a simple dual-gradient-descent-based bidding policy that maintains a dual variable for budget constraint as the bidder consumes her budget. In analysis, we consider two settings regarding the bidder's knowledge of her private values in the future: (i) an uninformative setting where all the distributional knowledge (can be non-stationary) is entirely unknown to the bidder, and (ii) an informative setting where a prediction of the budget allocation in advance. We characterize the performance loss (or regret) relative to an optimal policy with complete information on the stochasticity. For uninformative setting, We show that the regret is \tilde{O}(\sqrt{T}) plus a variation term that reflects the non-stationarity of the value distributions, and this is of optimal order. We then show that we can get rid of the variation term with the help of the prediction; specifically, the regret is \tilde{O}(\sqrt{T}) plus the prediction error term in the informative setting.
翻译:研究预算受限投标人如何在重复首价拍卖中学习自适应投标以最大化其累积收益。这一问题源于近年来展示广告行业从次价拍卖向首价拍卖的全面转型,导致真实报价策略(即始终按私人估值报价)不再最优。我们提出一种基于对偶梯度下降的简洁投标策略,该策略在投标人消耗预算时维持预算约束的对偶变量。分析中考虑投标人对未来私人估值认知程度的两种场景:(i)无信息场景,其中所有分布知识(可能呈现非平稳性)对投标人完全未知;(ii)有信息场景,其中投标人可提前获取预算分配预测。我们刻画了相对于完全了解随机性信息的最优策略的性能损失(即遗憾值)。在无信息场景下,证明遗憾值为\tilde{O}(\sqrt{T})加上反映价值分布非平稳性的变分项,且该界达到最优阶次。进一步表明,借助预测信息可消除该变分项:具体而言,有信息场景的遗憾值为\tilde{O}(\sqrt{T})加上预测误差项。