Online Learning in Multi-unit Auctions

We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round, $K$ identical units of a good are sold to a group of buyers that have valuations with diminishing marginal returns. The buyers submit bids for the units, and then a price $p$ is set per unit so that all the units are sold. We consider two variants of the auction, where the price is set to the $K$-th highest bid and $(K+1)$-st highest bid, respectively. We analyze the properties of this auction in both the offline and online settings. In the offline setting, we consider the problem that one player $i$ is facing: given access to a data set that contains the bids submitted by competitors in past auctions, find a bid vector that maximizes player $i$'s cumulative utility on the data set. We design a polynomial time algorithm for this problem, by showing it is equivalent to finding a maximum-weight path on a carefully constructed directed acyclic graph. In the online setting, the players run learning algorithms to update their bids as they participate in the auction over time. Based on our offline algorithm, we design efficient online learning algorithms for bidding. The algorithms have sublinear regret, under both full information and bandit feedback structures. We complement our online learning algorithms with regret lower bounds. Finally, we analyze the quality of the equilibria in the worst case through the lens of the core solution concept in the game among the bidders. We show that the $(K+1)$-st price format is susceptible to collusion among the bidders; meanwhile, the $K$-th price format does not have this issue.

翻译：我们考虑具有统一定价的重复多单元拍卖，这类拍卖广泛应用于碳许可证等商品的分配中。每轮拍卖中，$K$件同质商品被出售给一组边际收益递减的买家。买家对商品提交出价，随后设定每件商品的价格$p$，使得所有商品均能售出。我们考虑拍卖的两种变体，其价格分别设定为第$K$高报价和第$(K+1)$高报价。我们分析了该拍卖在离线和在线两种场景下的性质。在离线场景中，我们研究单个参与者$i$面临的问题：给定包含历史拍卖中竞争对手出价的数据集，寻找一个能最大化参与者$i$在该数据集上累积效用的出价向量。我们通过证明该问题等价于在精心构建的有向无环图中寻找最大权重路径，设计了一种多项式时间算法。在在线场景中，参与者通过运行学习算法在参与拍卖过程中逐步更新出价。基于我们的离线算法，我们设计了高效的在线学习算法用于出价优化。这些算法在全信息反馈和bandit反馈结构下均具有次线性遗憾值。我们通过遗憾下界对在线学习算法进行了补充。最后，我们通过投标人博弈中核心解概念的角度，分析了最坏情况下均衡的质量。研究表明，$(K+1)$价拍卖格式易受投标人合谋影响；而$K$价拍卖格式不存在此问题。