Understanding the convergence properties of learning dynamics in repeated auctions is a timely and important question in the area of learning in auctions, with numerous applications in, e.g., online advertising markets. This work focuses on repeated first price auctions where bidders with fixed values for the item learn to bid using mean-based algorithms -- a large class of online learning algorithms that include popular no-regret algorithms such as Multiplicative Weights Update and Follow the Perturbed Leader. We completely characterize the learning dynamics of mean-based algorithms, in terms of convergence to a Nash equilibrium of the auction, in two senses: (1) time-average: the fraction of rounds where bidders play a Nash equilibrium approaches 1 in the limit; (2)last-iterate: the mixed strategy profile of bidders approaches a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate. - If the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in time-average but not necessarily in last-iterate. - If the number is one, the bidding dynamics may not converge to a Nash equilibrium in time-average nor in last-iterate. Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms.
翻译:理解重复拍卖中学习动态的收敛性质是拍卖学习领域中一个及时且重要的问题,在在线广告市场等领域具有众多应用。本文聚焦于重复的第一价格拍卖,其中具有固定物品估值的竞拍者使用均值学习算法(一类包含乘法权重更新和跟随扰动领导者等广泛流行的无遗憾学习算法的在线学习算法)进行出价学习。我们从两个维度完整刻画了均值学习算法在拍卖中向纳什均衡收敛的学习动态:(1)时间平均:竞拍者采取纳什均衡的轮次比例在极限下趋近于1;(2)末轮迭代:竞拍者的混合策略分布轮廓在极限下趋近于纳什均衡。具体地,结果取决于具有最高估值的竞拍者数量:- 若数量至少为三,出价动态几乎必然在时间平均和末轮迭代两个意义上收敛至拍卖的纳什均衡。- 若数量为二,出价动态几乎必然在时间平均意义上收敛至纳什均衡,但末轮迭代不一定收敛。- 若数量为一,出价动态在时间平均和末轮迭代意义上均可能不收敛至纳什均衡。这一发现为学习算法收敛动态的研究开辟了新的可能性。