We present a novel framework for ε-optimally solving two-player zero-sum partially observable stochastic games (zs-POSGs). These games pose a major challenge due to the absence of a principled connection with dynamic programming (DP) techniques developed for two-player zero-sum stochastic games (zs-SGs). Prior attempts at transferring solution methods have lacked a lossless reduction, defined here as a transformation that preserves value functions, equilibrium strategies, and optimality structure, thereby limiting generalisation to ad-hoc algorithms. This work introduces the first lossless reduction from zs-POSGs to transition-independent zs-SGs, enabling the principled application of a broad class of DP-based methods. We show empirically that point-based value iteration (PBVI) algorithms, applied via this reduction, produce ε-optimal strategies across a range of benchmark domains, consistently matching or outperforming existing state-of-the-art methods. Our results open a systematic pathway for algorithmic and theoretical transfer from SGs to partially observable settings.


翻译:我们提出了一种新颖的框架,用于ε-最优求解两人零和部分可观测随机博弈(zs-POSGs)。由于缺乏与为两人零和随机博弈(zs-SGs)开发的动态规划(DP)技术之间的原则性关联,这些博弈构成了重大挑战。先前尝试迁移求解方法时缺乏无损约简,此处定义为一种保持价值函数、均衡策略和最优性结构的变换,从而限制了向特设算法的泛化。本工作首次提出了从zs-POSGs到转移独立的zs-SGs的无损约简,使得能够原则性地应用一大类基于DP的方法。我们通过实证表明,基于点的价值迭代(PBVI)算法通过此约简应用后,在一系列基准领域中均能产生ε-最优策略,持续匹配或超越现有的最先进方法。我们的结果为从随机博弈到部分可观测场景的算法与理论迁移开辟了一条系统化路径。

0
下载
关闭预览

相关内容

迄今为止,产品设计师最友好的交互动画软件。

Top
微信扫码咨询专知VIP会员