The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, arbitrary restrictions induced by exogenous stochasticity might be placed on a player's action set. To model this setting, for a game $\mathcal{G}_{\mathrm{orig}}$ with action set $A_i$ for each player $i$, we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets $\mathcal{S}_i \subseteq 2^{\vert A_i\vert}\backslash\{\varnothing\}$. In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, thus existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size $\vert A_i\vert$, overcoming naive exponential sized representation of equilibria. Computationally, we introduce an efficient approach based on sleeping internal regret minimization and show that it converges to approximate NE in 2p0s-GSAS at a rate $O(\sqrt{\log\vert A_i\vert/T})$ with appropriate choice of stepsizes, avoiding the exponential blow-up of game-dependent constants.
翻译:传统博弈学习研究通常假设每位参与者始终能够使用其所有可选行动。然而,在实际场景中,外生随机性可能导致参与者的行动集受到任意限制。为建模此类情境,针对每个参与者i具有行动集A_i的原始博弈G_orig,我们引入对应的具有随机行动集的博弈(GSAS),该模型通过参与者可能行动子集集合S_i ⊆ 2^{|A_i|}\{∅}上的概率分布进行参数化。在GSAS中,参与者策略与纳什均衡(NE)的表示规模呈指数级增长,导致现有NE计算算法可扩展性较差。在参与者间行动可用性相互独立的假设下,我们证明双人零和(2p0s)GSAS中的NE可通过规模为|A_i|的向量进行紧凑表示,从而克服了均衡表示规模随维度指数增长的问题。在计算层面,我们提出基于休眠内部遗憾最小化的高效方法,并证明通过合适的步长选择,该方法能以O(√(log|A_i|/T))的速率收敛至2p0s-GSAS中的近似NE,避免了博弈相关常数的指数级膨胀。