Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPS give access to results for the widely studied POSG model; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.
翻译:部分可观测马尔可夫决策过程(POMDP)依赖于概率分布精确已知的关键假设。鲁棒POMDP(RPOMDP)通过定义不精确概率(称为不确定集)缓解了这一问题。尽管鲁棒MDP已被广泛研究,但关于RPOMDP的工作仍然有限,且主要集中于算法求解方法。我们通过证明以下三点拓展了对RPOMDP的理论理解:1)不确定集的不同假设会影响最优策略与值函数;2)RPOMDP具有部分可观测随机博弈(POSG)语义;3)同一RPOMDP在不同假设下会产生语义不同的POSG,进而导致不同的策略与值函数。这些关于RPOMDP的新语义为广泛研究的POSG模型提供了可参考的结果;具体来说,我们证明了纳什均衡的存在性。最后,我们利用所提出的语义对现有RPOMDP文献进行分类,厘清了这些现有工作在何种不确定假设下运行。