Many real-world auctions are dynamic processes, in which bidders interact and report information over multiple rounds with the auctioneer. The sequential decision making aspect paired with imperfect information renders analyzing the incentive properties of such auctions much more challenging than in the static case. It is clear that bidders often have incentives for manipulation, but the full scope of such strategies is not well-understood. We aim to develop a tool for better understanding the incentive properties in dynamic auctions by using reinforcement learning to learn the optimal strategic behavior for an auction participant. We frame the decision problem as a Markov Decision Process, show its relation to multi-task reinforcement learning and use a soft actor-critic algorithm with experience relabeling to best-respond against several known analytical equilibria as well as to find profitable deviations against exploitable bidder strategies.
翻译:许多现实世界的拍卖是动态过程,竞拍者与拍卖方在多轮交互中报告信息。这种序贯决策与不完全信息的结合,使得分析此类拍卖的激励属性比静态情况更具挑战性。显然,竞拍者往往存在操纵动机,但此类策略的完整范围尚未被充分理解。我们旨在开发一种工具,通过强化学习来学习拍卖参与者的最优策略行为,从而更好地理解动态拍卖的激励属性。我们将决策问题建模为马尔可夫决策过程,展示其与多任务强化学习的关系,并采用带有经验重标注的软演员-评论家算法,对多个已知分析均衡做出最优反应,同时针对可被利用的竞拍者策略发现有利可图的偏离行为。