This paper presents a learning dynamic with almost sure convergence guarantee for any stochastic game with turn-based controllers (on state transitions) as long as stage-payoffs induce a zero-sum or identical-interest game. Stage-payoffs for different states can even have different structures, e.g., by summing to zero in some states and being identical in others. The dynamics presented combines the classical stochastic fictitious play with value iteration for stochastic games. There are two key properties: (i) players play finite horizon stochastic games with increasing lengths within the underlying infinite-horizon stochastic game, and (ii) the turn-based controllers ensure that the auxiliary stage-games (induced from the continuation payoff estimated) are strategically equivalent to zero-sum or identical-interest games.
翻译:本文提出了一种在学习动力学上几乎必然收敛的算法,适用于任何具有轮换控制器(作用于状态转移)的随机博弈,只要单阶段收益诱导出零和或利益一致博弈。不同状态下的单阶段收益可以具有不同结构,例如在某些状态下收益之和为零,而在其他状态下收益相同。该动力学将经典随机虚拟博弈与随机博弈的价值迭代相结合。其两个关键特性为:(i)参与者在底层无限时域随机博弈内,依次进行长度递增的有限时域随机博弈;(ii)轮换控制器确保辅助单阶段博弈(由估计的延续收益诱导)在策略上等价于零和或利益一致博弈。