For a two-player imperfect-information extensive-form game (IIEFG) with $K$ time steps and a player action space of size $U$, the game tree complexity is $U^{2K}$, causing existing IIEFG solvers to struggle with large or infinite $(U,K)$, e.g., differential games with continuous action spaces. To partially address this scalability challenge, we focus on an important class of 2p0s games where the informed player (P1) knows the payoff while the uninformed player (P2) only has a belief over the set of $I$ possible payoffs. Such games encompass a wide range of scenarios in sports, defense, cybersecurity, and finance. We prove that under mild conditions, P1's (resp. P2's) equilibrium strategy at any infostate concentrates on at most $I$ (resp. $I+1$) action prototypes. When $I\ll U$, this equilibrium structure causes the game tree complexity to collapse to $I^K$ for P1 when P2 plays best responses, and $(I+1)^K$ for P2 in a dual game where P1 plays best responses. We then show that exploiting this structure in model-free multiagent reinforcement learning and model predictive control leads to significant improvements in learning accuracy and efficiency from SOTA IIEFG solvers. Our demonstration solves a 22-player football game with continuous action spaces and $K=10$ time steps, where the offense team needs to strategically conceal their play until a critical moment in order to exploit information advantage. Code is available at https://github.com/ghimiremukesh/cams/tree/iclr
翻译:对于具有$K$个时间步长且玩家动作空间大小为$U$的两人非完美信息扩展式博弈(IIEFG),其博弈树复杂度为$U^{2K}$,这导致现有IIEFG求解器在处理较大或无限$(U,K)$时面临困难,例如具有连续动作空间的微分博弈。为部分解决这一可扩展性挑战,我们聚焦于一类重要的2p0s博弈:知情玩家(P1)知晓收益函数,而不知情玩家(P2)仅对$I$种可能收益的集合具有信念。此类博弈广泛存在于体育竞技、国防安全、网络安全及金融领域。我们证明在温和条件下,P1(对应地,P2)在任何信息状态下的均衡策略最多集中于$I$个(对应地,$I+1$个)动作原型。当$I\ll U$时,该均衡结构使得当P2采取最优响应时,P1面对的博弈树复杂度坍缩为$I^K$;而在P1采取最优响应的对偶博弈中,P2面对的复杂度坍缩为$(I+1)^K$。我们进一步证明,在无模型多智能体强化学习与模型预测控制中利用此结构,能够显著提升学习精度与效率,超越现有最优IIEFG求解器。我们的演示求解了一个具有连续动作空间和$K=10$时间步长的22人足球博弈,其中进攻方需要策略性地隐藏战术直至关键时刻,以充分利用信息优势。代码发布于https://github.com/ghimiremukesh/cams/tree/iclr