Markov Decision Processes (MDPs) model systems with uncertain transition dynamics. Multiple-environment MDPs (MEMDPs) extend MDPs. They intuitively reflect finite sets of MDPs that share the same state and action spaces but differ in the transition dynamics. The key objective in MEMDPs is to find a single policy that satisfies a given objective in every associated MDP. The main result of this paper is PSPACE-completeness for almost-sure Rabin objectives in MEMDPs. This result clarifies the complexity landscape for MEMDPs and contrasts with results for the more general class of partially observable MDPs (POMDPs), where almost-sure reachability is already EXPTIME-complete, and almost-sure Rabin objectives are undecidable.
翻译:马尔可夫决策过程(MDPs)用于建模具有不确定转移动态的系统。多环境马尔可夫决策过程(MEMDPs)是MDPs的扩展。它们直观地反映了共享相同状态和动作空间但转移动态不同的有限MDP集合。MEMDPs中的核心目标是找到一个单一策略,使其在每个关联MDP中均满足给定目标。本文的主要结果是证明了MEMDPs中几乎必然Rabin目标的PSPACE完全性。该结果明确了MEMDPs的复杂度格局,并与更一般的部分可观测MDPs(POMDPs)类别中的结果形成对比:在POMDPs中,几乎必然可达性已是EXPTIME完全问题,而几乎必然Rabin目标则是不可判定的。