We compute equilibrium strategies in multi-stage games with continuous signal and action spaces as they are widely used in the management sciences and economics. Examples include sequential sales via auctions, multi-stage elimination contests, and Stackelberg competitions. In sequential auctions, analysts performing equilibrium analysis are required to derive not just single bids but bid functions for all possible signals or values that a bidder might have in multiple stages. Due to the continuity of the signal and action spaces, these bid functions come from an infinite dimensional space. While such models are fundamental to game theory and its applications, equilibrium strategies are rarely known. The resulting system of non-linear differential equations is considered intractable for all but elementary models. This has been limiting progress in game theory and is a barrier to its adoption in the field. We show that Deep Reinforcement Learning and self-play can learn equilibrium bidding strategies for various multi-stage games. We find equilibrium in models that have not yet been explored analytically and new asymmetric equilibrium bid functions for established models of sequential auctions. The verification of equilibrium is challenging in such games due to the continuous signal and action spaces. We introduce a verification algorithm and prove that the error of this verifier decreases when considering Lipschitz continuous strategies with increasing levels of discretization and sample sizes.
翻译:我们计算了在管理科学与经济学中广泛应用的、具有连续信号与行动空间的多阶段博弈均衡策略。示例包括通过拍卖进行的序贯销售、多阶段淘汰竞赛以及斯塔克尔伯格竞争。在序贯拍卖中,分析者进行均衡分析时不仅需要推导单一出价,还需为竞拍者在多个阶段可能持有的所有可能信号或价值推导出价函数。由于信号与行动空间的连续性,这些出价函数来自无限维空间。尽管此类模型是博弈论及其应用的基础,其均衡策略却鲜为人知。由此产生的非线性微分方程组被认为除基本模型外均难以处理,这限制了博弈论的发展,并成为其在相关领域应用的障碍。我们证明,深度强化学习与自我博弈能够习得多种多阶段博弈的均衡出价策略。我们在尚未被解析探索的模型中发现了均衡,并为既有序贯拍卖模型提出了新的非对称均衡出价函数。由于连续信号与行动空间的存在,此类博弈中的均衡验证具有挑战性。我们提出一种验证算法,并证明当考虑利普希茨连续策略且不断提高离散化程度与样本量时,该验证器的误差会减小。