While recent reductions of zero-sum partially observable stochastic games (zs-POSGs) to transition-independent stochastic games (TI-SGs) theoretically admit dynamic programming, practical solutions remain stifled by the inherent non-linearity and exponential complexity of the simultaneous minimax backup. In this work, we surmount this computational barrier by rigorously recasting the simultaneous interaction as a sequential decision process via the principle of separation. We introduce distinct sufficient statistics for valuation and execution, the sequential occupancy state and the private occupancy family, which reveal a latent geometry in the optimal value function. This structural insight allows us to linearise the backup operator, reducing the update complexity from exponential to polynomial while enabling the direct extraction of safe policies without heuristic bookkeeping. Experimental results demonstrate that algorithms leveraging this sequential framework significantly outperform state-of-the-art methods, effectively rendering previously intractable domains solvable.
翻译:尽管近期将零和部分可观测随机博弈(zs-POSGs)归约为转移独立随机博弈(TI-SGs)的方法在理论上支持动态规划,但实际求解仍受限于同时极小化极大备份操作固有的非线性和指数级复杂度。本研究通过分离原理将同时交互严格重构为序贯决策过程,从而克服了这一计算障碍。我们引入用于估值与执行的不同充分统计量——序贯占据状态与私有占据族,揭示了最优值函数中潜在的几何结构。这一结构性洞察使我们能够线性化备份算子,将更新复杂度从指数级降至多项式级,同时无需启发式簿记即可直接提取安全策略。实验结果表明,利用该序贯框架的算法显著优于现有最先进方法,有效解决了此前难以处理的领域。