Learning in Repeated Multi-Unit Pay-As-Bid Auctions

Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of $O(M\sqrt{T\log |\mathcal{B}|})$ and $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ respectively, where $M$ is the number of units demanded by the bidder, $T$ is the total number of auctions, and $|\mathcal{B}|$ is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in $M$. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.

翻译：受碳排放交易计划、国债拍卖和采购拍卖的启发——这些拍卖均涉及同质多单位的拍卖——我们考虑了如何在重复进行的多单位按报价支付拍卖中学习投标的问题。在此类拍卖中，大量（相同的）物品分配给出价最高的投标，其中每个中标投标的价格等于其出价本身。由于行动空间的组合性质，学习如何在按报价支付拍卖中投标颇具挑战性。我们通过聚焦离线设置来克服这一挑战，在该设置中，投标者仅能获取其他投标者过去提交的投标信息，从而优化其投标向量。我们证明离线问题的最优解可通过多项式时间的动态规划（DP）方案获得。我们利用该DP方案的结构，设计了在全信息和赌博机反馈设置下具有多项式时间和空间复杂度的在线学习算法。我们分别在上述两种设置下实现了悔恨上界$O(M\sqrt{T\log |\mathcal{B}|})$和$O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$，其中$M$为投标者需求单位数量，$T$为拍卖总次数，$|\mathcal{B}|$为离散化投标空间的大小。我们为这些结果补充了悔恨下界，该下界匹配了对$M$的线性依赖关系。数值结果表明，当所有参与者遵循我们提出的无遗憾学习算法时，市场动态主要收敛到福利最大化的均衡状态，此时投标者提交统一投标。最后，我们的实验表明，与其流行的替代方案——统一价格拍卖相比，按报价支付拍卖始终能产生显著更高的收入。