In this paper, we study how a budget-constrained bidder should learn to bid adaptively in repeated first-price auctions to maximize cumulative payoff. This problem arises from the recent industry-wide shift from second-price auctions to first-price auctions in display advertising, which renders truthful bidding suboptimal. We propose a simple dual-gradient-descent-based bidding policy that maintains a dual variable for the budget constraint as the bidder consumes the budget. We analyze two settings based on the bidder's knowledge of future private values: (i) an uninformative setting where all distributional knowledge (potentially non-stationary) is entirely unknown, and (ii) an informative setting where a prediction of budget allocation is available in advance. We characterize the performance loss (regret) relative to an optimal policy with complete information. For uninformative setting, we show that the regret is ~O(sqrt(T)) plus a Wasserstein-based variation term capturing non-stationarity, which is order-optimal. In the informative setting, the variation term can be eliminated using predictions, yielding a regret of ~O(sqrt(T)) plus the prediction error. Furthermore, we go beyond the global budget constraint by introducing a refined benchmark based on a per-period budget allocation plan, achieving exactly ~O(sqrt(T)) regret. We also establish robustness guarantees when the baseline policy deviates from the planned allocation, covering both ideal and adversarial deviations.
翻译:暂无翻译